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(57) ABSTRACT 

Predictive modeling of consumer financial behavior is pro- 
vided by application of consumer transaction data to pre- 
dictive models associated with merchant segments. Mer- 
chant segments are derived from consumer transaction data 
based on co-occurrences of merchants in sequences of 
transactions. Merchant vectors representing specific mer- 
chants are clustered to form merchant segments in a vector 
space as a function of the degree to which merchants 
co-occur more or less frequently than expected. Each mer- 
chant segment is trained using consumer transaction data in 
selected past time periods to predict spending in subsequent 
time periods for a consumer based on previous spending by 
the consumer. Consumer profiles describe summary statis- 
tics of consumer spending in and across merchant segments. 
Analysis of consumers associated with a segment identifies 
selected consumers according to predicted spending in the 
segment or other criteria, and the targeting of promotional 
offers specific to the segment and its merchants. 

19 Claims, 10 Drawing Sheets 
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SAMPLE MERCHANT SEGMENT INDEX 



c 



200 



c 



202 



r 



204 




(1) : Direct Marketing: Housewares Appliances: Senior: CA:WA} 210 

(2) : Retail: Mall: Sporting Goods and Entertainment: Young adult 

(18) : Travel: Tourist: Golf: Traveler 

(19) : Retail: Department Stores: Furniture 

(20) : Retail: Mall: Clothing and Accessories: Male and Female 

(21) : Retail: Shoes: Furniture and Accessories 

(103) : Direct Marketing: Social Services: Religion 

(104) : Retail: Clothing: Family: SE Pennsylvania 

(105) : Direct Marketing: Internet and Catalog: PCs: Adult 

(106) : Retail: Housewares and Utilities: Homeowners 

(107) : Retail: Auto: Housewares: Virginia 

(108) : Retail: Housewares: Homeowners: CA: NV: WA 

(173) : Retail: Computers: Sports: Student: RI 

(174) : Services: Financial: Casinos: Gamblers: 

(175) : Retail: Home and Accessories 

(176) : Education: Tuition: Books: Student: RI 

(206) : Retail: Direct Market: Catalog: Women Clothing: Female 

(207) : Retail: Home Improvement: Female 

(208) : Direct Marketing: Catalog: Office Supplies: Business Owners 

(209) : Retail: Department Stores: General Merch: Youth 

(210) : Retail: Furniture: Recreation: Student: CA 

(211) : Direct Marketing: Catalog 

(212) : Retail: Sporting Goods: Tennis: Male 

(253) : Retail: Books: Electronics: Jewelry 

(254) : Recreation: Sports Fans: Hardware: Male: CA 

(255) : Direct Marketing: Electronics: Male 

(256) : Retail: Electronics: Office Supplies 

(257) : Retail: Electronics 

(258) : Retail: Yard and Garden: Automotive: NV 

(299) : Retail: Household: Yard and Garden: NV 

(300) : Direct Marketing: Catalog: Music 
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PREDICTIVE MODELING OF CONSUMER the SIC codes. This set of classification is entirely arbitrary, 

FINANCIAL BEHAVIOR and has little to do with actual consumer behavior. Con- 
sumer do not decide which merchants to purchase from 

BACKGROUND based on their SIC code. Thus, the use of arbitrary classi- 

f . 5 fications to predict financial behavior is doomed to failure, 

. le o nvention me classification nave little meaning in the actual data 

The present invention relates generally to analysis of 0 f consumer spending, 

consumer financial behavior, and more particularly to ana- A lhird problem is that different groups of consumers 

lyzmg historical consumer financial behavior to accurately spend money in different ways . For example, consumers 

predict future spending behavior, and more particularly, 10 wno frequem high-end retailers have entirely different 

future spending in specifically identified data-driven indus- spending habits than consumers who are bargain shoppers, 

try segments. j Q d ea i with this problem, most systems focus exclusively 

2. Background of Invention on very specific, predefined types of consumers, in effect, 

Retailers, advertisers, and many other institutions are assuming that the interests or types of consumers are known, 

keenly interested in understanding consumer spending hab- *5 and targeting these consumers with what are believed to be 

its. These companies invest tremendous resources to identify advertisements or promotions of interest to them. However, 

and categorize consumer interests, in order to learn how this approach essentially puts the cart before the proverbial 

consumers spend money. If the interests of an individual horse: it assumes the interests and spending patterns of a 

consumer can be determined, then it is believed that adver- particular group of consumers, it does not discover them 

tising and promotions related to these interests will be more 20 from actual spending data. It thus begs the questions as to 

successful in obtaining a positive consumer response, such whether the assumed group of consumers in fact even exists, 

as purchases of the advertised products or services. or has the interest that are assumed for it. 

Conventional means of determining consumer interests Accordingly, what is needed is the ability to model 

have generally relied on collecting demographic information consumer financial behavior based on actual historical 

about consumers, such as income, age, place of residence, 25 spending patterns that reflect the time-related nature of each 

occupation, and so forth, and associating various demo- consumer's purchase. Further, it is desirable to extract 

graphic categories with various categories of interests and meaningful classifications of merchants based on the actual 

merchants. Interest information may be collected from spending patterns, and from the combination of these, pre- 

surveys, publication subscription lists, product warranty diet future spending of an individual consumer in specific, 

cards, and myriad other sources. Complex data processing is 30 meaningful merchant groupings. 

then applied to the source of data resulting in some demo- In the application domain of information, and particularly 

graphic and interest description of each of a number of text retrieval, vector based representations of documents and 

consumers. words is known. Vector space representations of documents 

This approach to understanding consumer behavior often are described in U.S. Pat. No. 5,619,709 issued to Caid et. 

misses the mark. The ultimate goal of this type of approach, 35 al » and in us - Pat - No - 5,325,298 issued to Gallant, 

whether acknowledged or not, is to predict consumer spend- Generally, vectors are used to represent words or documents, 

ing in the future. The assumption is that consumers will The relationships between words and between documents is 

spend money on their interests, as expressed by things like learned and encoded in the vectors by a learning law. 

their subscription lists and their demographics. Yet, the data However, because these uses of vector space 

on which the determination of interests is made is typically representations, including the context vectors of Caid, are 

only indirectly related to the actual spending patterns of the designed for primarily for information retrieval, they are not 

consumer. For example, most publications have developed effective for predictive analysis of behavior when applied to 

demographic models of their readership, and offer their documents such as credit card statements and the like. When 

subscription lists for sale to others interested in the particular the techniques of Caid were applied to the prediction 

demographics of the publication's readers. But subscription problems, it had numerous shortcomings. First, it had prob- 

to a particular publication is a relatively poor indicator of lems deauD g with hi g D transaction count merchants. These 

what the consumer's spending patterns will be in the future. are merchants whose names appear very frequently in the 

Even taking into account multiple different sources of collections of transaction statements. Because Caid's system 

data, such as combining subscription lists, warranty regis- so d ° wn P la y s the significance of frequently appearing terms, 

tration cards, and so forth still only yields an incomplete these hl & h transaction frequency merchants were not being 

collection of unrelated data about a consumer. accurately represented. Excluding high transacUon fre- 

~ c . , , quency merchants from the data set however undermines the 

One of the problems in these conventional approaches is „w;r... . „j ^ ♦ „ • .u • , 

. r . __ .rr system s ability to predict transactions in these important 

that spending patterns are time based. That >s, consumers merchanls . Second, it wasdiscovered that past two iterations 

spend money at merchants which arc of interest to them m 55 rf minin aid>s m rformancc mstead f 

typically a ume related manner. For example, a consumer c ta ^ mdicates lha , lhe , earni , aw b , earni 

who^abusmesstravelerspendsmoneyonplaneuckeu,car informatioD lhat j, only coincidental to transaction 

rentals hotel accommodations, restaurants^and entertain- diclion> instead of informatioD lhal j, ^i^Uy for 

men all dunng a single business trip. Tliese purchases lransactioD prediction . Accordingly, it is desirable to provide 

together more stron&ly describe the consumer s true inter- *n lL j t r i <L % i_- , 

7 j c L ■ . 7' Z 60 a new methodology for learning the relationships between 

ests and preferences than any single one of the purchases mcrch , Qls and consumers so as to properly reflect the 

alone Yet conventional approaches to consumer analysis si of me frequency with which mer chants appears 

typically treats these purchases individually and as unrelated in & me ^^,0,, dat g 
in time. 

Yet another problem with conventional approaches is that 65 SUMMARY OF THE INVENTION 
categorization of purchases is often based on standardized The present invention overcomes the limitations of con- 
industry classifications of merchants and business, such as ventional approaches to consumer analysis by providing a 
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system and method of analyzing and predicting consumer An advantage of this approach is that both consumers and 

financial behavior that uses historical, and lime -sensitive, merchants are represented in a common vector space. This 

spending patterns of individual consumers to create both means that given a consumer vector, the merchant vectors 

meaningful groupings (segments) of merchants which accu- which are "similar" to this consumer vector can be readily 

rately reflect underlying consumer interests, and a predictive 5 determined (that is they point in generally the same direction 

model of consumer spending patterns for each of the mer- in the merchant vector space), for example using dot product 

chant segment. Current spending data of an individual ana lysis. Thus, merchants who are "similar** to the consumer 

consumer or groups of consumers can then be applied to the can be easily determined, these being merchants who would 

predictive models to predict future spending of the consum- likely be of interest to the consumer, even if the consumer 

ers in each of the merchant clusters. 10 has never purchased from these merchants before. 

In one aspect, the present invention includes the creation Given the merchant segments, the present invention then 

of data-driven grouping of merchants, based essentially on creates a predictive model of future spending in each mer- 

the actual spending patterns of a group of consumers. chant segment, based on transaction statistics of historical 

Spending data of each consumer is obtained, which spending in the merchant segment by those consumers who 

describes the spending patterns of the consumers in a ^ have purchased from merchants in the segments, in other 

time-related fashion. For example, credit card data demon- segments, and data on overall purchases. In one 

strates not merely the merchants and amounts spent, but also embodiment, each predictive model predicts spending in a 

the sequence in which* purchases were made. One of the merchant cluster in a predicted time interval, such as 3 

features of the invention is its ability to use the months, based on historical spending in the cluster in a prior 

co-occurrence of purchases at different merchants to group 2 o l * me interval, such as the previous 6 months. During model 

merchants into meaningful merchant segments. That is, training, the historical transactions in the merchant cluster 

merchants which are frequently shopped at within some for consumers who spent in the cluster, is summarized in 

number of transactions or time period of each other reflect each consumer's profile in summary statistics, and input into 

a meaningful cluster. This data-driven clustering of mer- the predictive model along with actual spending in a pre- 

chants more accurately describes the interests or preferences 25 dieted time interval. Validation of the predicted spending 

of consumers. with actual spending is used to confirm model performance. 

In a preferred embodiment, the analysis of consumer predictive models may be a neural networks, or other 

spending uses spending data, such as credit card statements, multivariate statistical model. 

and processes that data to identify co-occurrences of pur- This modeling approach is advantageous for two reasons, 

chases within defined co-occurrence windows, which may 30 First, the predictive models are specific to merchant clusters 

be based on either a number of transactions, a time interval, that actually appear in the underlying spending data, instead 

or other sequence related criteria. Each merchant is associ- of for arbitrary classifications of merchants such as SIC 

ated with vector representation; the initial vectors for all of classes. Second, because the consumer spending data of 

the merchants are randomized to present a quasi -orthogonal those consumers who actually purchased at the merchants in 

set of vectors in a merchant vector space. Each consumer's 35 the merchant clusters is used, they most accurately reflect 

transaction data reflecting their purchases (e.g. credit card how these consumer have spent and will spend at these 

statements, bank statements, and the like) is chronologically merchants. 

organized to reflect the general order in which purchases To predict financial behavior, the consumer profile of a 

were made at the merchants. Analysis of each consumer's consumer, using preferably the same type of summary 

transaction data in various co-occurrence windows identifies 40 statistics for a recent, past time period, is input into the 

which merchants co-occur. For each pair of merchants, their predictive models for the different merchant clusters. The 

respective merchant vectors are updated in the vector space result is a prediction of the amount of money that the 

as a function of their frequency of their co-occurrence. After consumer is likely to spend in each merchant cluster in a 

processing of the spending data, the merchant vectors of future time interval, for which no actual spending data may 

merchants which are frequented together are generally 45 yet be available. 

aligned in the same direction in the merchant vector space. For each consumer, a membership function may be 

Clustering techniques are then applied to find clusters of defined which describes how strongly the consumer is 

merchants based on their merchant vectors. These clusters associated with each merchant segment. (Preferably, the 

form the merchant segments, with each merchant segment membership function outputs a membership value for each 

having a list of merchants in it. Each merchant segment 50 merchant segment.) The membership function may be the 

yields useful information about the type of merchants, their predicted future spending in each merchant segment, or it 

average purchase and transaction rates, and other statistical may be a function of the consumer vector for the consumer 

information. (Merchant "segments" and merchant "clusters" and a merchant segment vector (e.g. centroid of each mer- 

are used interchangeably herein.) chant segment). The membership function can be weighted 

Preferably, each consumer is also given a profile that 55 by the amount spent by the consumer in each merchant 

includes various demographic data, and summary data on segment, or other factors. Given the membership function, 

spending habits. In addition, each consumer is preferably the merchant clusters for which the consumer has the highest 

given a consumer vector. From the spending data, the membership values are of particular interest: they are the 

merchants that the consumer has most frequently or recently clusters in which the consumer will spend the most money 

purchased is determined. The consumer vector is then the 60 in the future, or whose spending habits are most similar to 

summation of these merchant vectors. As new purchases are the merchants in the cluster. This allows very specific and 

made, the consumer vector is updated, preferably decaying accurate targeting of promotions, advertising and the like to 

the influence of older purchases. In essence, like the expres- these consumers. A financial institution using the predicted 

sion "you are what you eat," the present invention reveals spending information can direct promotional offers to con- 

"you are whom you shop at," since the vectors of the 65 sumers who are predicted to spend heavily in a merchant 

merchants are used to construct the vectors of the consum- segment, with the promotional offers associated with mer- 

ers. chants in the merchant segment. 
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Also, given the membership values, changes in the mem- updated so that actual dot products between them at least 

bership values can be readily determined over time, to closely approximate the desired dot products previously 

identify transitions by the consumer between merchants determined. 

segments of interest. For example, each month (e.g. after a The present invention also includes a method for deter- 
new credit card billing period or bank statement), the 5 mining whether any two strings represent the same thing, 
membership function is determined for a consumer, result- such as variant spellings of a merchant name. This aspect of 
ing in a new membership value for each merchant cluster. the invention is beneficially used to identify and normalize 
The new membership values can be compared with the merchant names given what is typically a variety of different 
previous month's membership values to indicate the largest spellings or forms of a same merchant name in large 
positive and negative increases, revealing the consumer's 10 quantities of transaction data. In this aspect of the invention, 
changing purchasing habits. Positive changes reflect pur- the frequency of individual trigrams (more generally, 
chasing interests in new merchant clusters; negative changes n-grams) for a set of strings, such as merchant names in 
reflect the consumer's lack of interest in a merchant cluster transaction data, is determined. Each trigram is given a 
in the past month. Segment transitions such as these further weight base on its frequency. Preferably, frequently occur- 
enable a financial institution to target consumers with pro- 35 ring trigrams are assigned low weights, while rare trigrams 
motions for merchants in the segments in which the con- are assigned high weights. A high dimensional vector space 
sumers show significant increases in membership values. is defined, with one dimension for each trigram. Orthogonal 
In another aspect, the present invention provides an unit vectors are defined for each trigram. Each string (e.g. 
improved methodology for learning the relationships merchant name) to be compared is given a vector in the 
between merchants in transaction data, and defining vectors 20 trigram vector space. This vector is defined as the sum of the 
which represent the merchants. More particularly, this aspect unil vectors for each trigram in the string, weighted by the 
of the invention accurately identifies and captures the pat- trigram weight. Any two strings, such as merchant names, 
terns of spending behavior which result in the cooccurrence can QOW be compared by taking their dot product. If the dot 
of transactions at different merchants. The methodology is product is above a threshold (determined from analysis of 
generally as follows: 25 me data ^O* tnen me strings are deemed to be equivalents 

. \ f <• iU . , r u . of each other. Normalizing the length of the string vectors 

First, the number of times that each pair of merchants . . . ,° & . . .. .. 

co-occur with one another in the transaction data is deter- ™* * *f f. t0 make * e companson msens.Uve to the 

j mi j i • i . t . . . t length of the original strings. With either partial 

mined. The underlying intuition here is that merchants t b . , & t . , ~ t \ 

, t , , ? . • j . . . . 1tJ (normalization of one string but not the other) or non- 

whom the consumers behaviors indicates as being related v ,. 4 . . . . iL r a . y . , 4 

.„ tU i » . j f 4 , 30 normalization, string length influences the comparison, but 

will occur together often, whereas unrelated merchants do , , . ' & L z r * . A 

. ° . fc V i iL .„ may be used to match parts of one string against the entirety 

not occur together often. For example, a new mother will c ^ „ . JT- iL . , . , r 

,., , . T l'u » i .l * * * j *t_ of another string. This methodology provides for an 

likely shop at children s clothes stores, toy stores, and other 4 . - . f A i_ • r . ■ . * 

. r , t . ... J i n »-i i extremely fast and accurate mechanism tor string matching. 

similar merchants, whereas a single young male will likely ™_ ; , . . j * j - r 

. . 4 c if / tl -j r The matchmg process may be used to determine, for 

not shop at these types of merchants. The identification of 1 t_ t_ v . 

.... Jr . c . 4 , 35 example, whether two merchant names are the same, two 

merchants is by counting occurrences of merchants names r _ . it _ ... ' . • 

4| _, . t ™ l * » i_ company names, two people names, or the like. This is 

m the transaction data. The merchants names may be c r 1 . ;. j- . -i a- 

• • j 4 j . , ^ j-rc / useful m applications needing to reconcile divergent sources 

normalized to reduce variations and equate different ver- . \f , . . & . . . . 5* 

f , . n or types of data containing strings which reference to a 

sions of a merchant s name to a single common name. Jr * / f *■ a c 

^ common group of entities (e.g. transaction records from 

Next, a relationship strength between each pair of mer- ^ ma taction saaxoe% containing names of merchants), 

chants is determined based on how much the observed ^ { be ^ various forms 

co-occurrence of the merchants deviated from an expected . „ L, m „ 11#0 , ™^.~t „„™f ;„,„ nt ; n „ 

r . . . r - As a computer program product, the present invention 

co-occurrence of the merchant pair. I he expected , , , . . , , 4 . . . , 

, n_ r includes a data preprocessing module that takes consumer 

co-occurrence is based on statistical measures of how fre- a- a . a . , C1 c 

• i • j. .j i . t • .L . spending data and processes it into organized files of account 

quently the individual merchants appear in the transaction „ i . j j «■ j u n * r 

j t J . ,/ • i ■ i_- 4 5 related and time organized purchases. Processing of mer- 

data or in co-occurrence events. Various relationship . , * • .u j- j * ■ j j . v 

. , , , , r . r chant names in the spending data is provided to normalize 

strength measures may be used based on for example, varian , names rf * merchan £ A data , 

standard deviations of predicted co-occurrence, or log- . . , . ci * * .• 

r » o mg module generates consumer profiles of summary stalis- 

likelihood ratios. * . , f , . . , / ... 

tics m selected tune intervals, for use in training the pre- 

The relationship strength measure has the features that 50 dictive model. A predictive model generation system creates 

two merchants that co-occur significantly more often than me rchant vectors, and clusters them into merchant clusters, 

expected are positively related to one another; two mer- and the predictive model of each mer chant segment 

chants that co-occur significantly less often than expected using the mnsum „ proflles and transaction data. Merchant 

are negatively related to one another, and two merchants that vectorS) and consumer profiles are stored in databases. A 

co-occur about the number of times expected are not related. 55 profiling eng ine apphes consumer profiles and consumer 

The relationship strength between each pair of merchants transaction data to the predictive models to provide pre- 
is then mapped into the vector space. This is done by dieted spending in each merchant segment, and to compute 
determining the desired dot product between each pair of membership functions of the consumers for the merchant 
merchant vectors as a function of the relationship strength segment. A reporting engine outputs reports in various 
between the pair of merchants. This step has the feature that go formats regarding the predicted spending and membership 
merchant vectors for positively related merchants have a information. A segment transition detection engine com- 
positive dot product, the merchant vectors for negatively putes changes in each consumer's membership values to 
related merchants have a negative dot product, and the identify significant transitions of the consumer between 
merchant vectors for unrelated merchants have a zero dot merchant clusters. The present invention may also be 
product. 65 embodied as a system, with the above program product 

Finally, given the determined dot products for merchant element cooperating with computer hardware components, 

vector pairs, ' the locations of the merchant vectors are and as a computer implemented method. 
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DESCRIPTION OF THE DRAWINGS 

FIGS, la-lc are illustrations of merchant and consumer 
vector representations. 

FIG. 2 is a sample list of merchant segments. 

FIG. 3 is a flowchart of the overall process of the present 
invention. 

FIG. 4a is an illustration of the system architecture of one 
embodiment of the present invention during operation. 

FIG. 4b is an illustration of the system architecture of the 
present invention during development and training of mer- 
chant vectors, and merchant segment predictive models. 

FIG. 5 is an illustration of the functional components of 
the predictive model generation system. 

FIGS. 6a and 6b are illustrations of forward and backward 
co-occurrence windows. 

FIG. la is an illustration of the master file data prior to 
stemming and equivalencing, and FIG. lb is an illustration 
of a forward co-occurrence window in this portion of the 
master file after stemming and equivalencing. 

FIG. 8 is an illustration of the various types of observa- 
tions during model training. 

FIG. 9 is an illustration of the application of multiple 
consumer account data to the multiple segment predictive 
models. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

A. Overview of Consumer and Merchant Vector Represen- 
tation and the Cooccurrence of Merchant Purchases 

B. System Overview 

C. Functional Overview 

D. Data Preprocessing Module 

E. Predictive Model Generation System 

1. Merchant Vector Generation 

2. Training of Merchant Vectors: The UDL Algorithm 

a) Co-occurrence Counting 

i) Forward co-occurrence counting 

ii) Backward co-occurrence counting 

iii) Bi-directional co-occurrence counting 

b) Estimating Expected Co-occurrence Counts 

c) Desired Dot-Products between Merchant Vectors 

d) Merchant Vector Training 

3. Clustering Module 

E Data Postprocessing Module 

G. Predictive Model Generation 

H. Profiling Engine 

1. Membership Function: Predicted Spending In Each 
Segment 

2. Segment Membership Based on Consumer Vectors 

3. Updating of Consumer Profiles 

I. Reporting Engine 

1. Basic Reporting Functionality 

2. General Segment Report 

a) General Segment Information 

b) Segment Members Information 

c) Lift Chart 

d) Population Statistics Tables 

i) Segment Statistics 

ii) Row Descriptions 
J. Targeting Engine 

1C Segment Transition Detection 

A. Overview of Consumer and Merchant Vector Repre- 
sentation and the Co-ocurrence of Merchant Purchases 
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One feature of the present invention that enables predic- 
tion of consumer spending levels at specific merchants is the 
ability to represent both consumer and merchants in a same 
modeling representation. A conventional example is 
attempting to classify both consumers and merchants with 
demographic labels (e.g. "baby boomers", or "empty- 
nesters"). This conventional approach is simply arbitrary, 
and does not provide any mechanisms for directly quanti- 
fying how similar a consumer is to various merchants. The 
present invention, however, does provide such a quantifiable 
analysis, based on high-dimensional vector representations 
of both consumers and merchants, and the co-occurrence of 
merchants in the spending data of individual consumers. 

Referring now to FIGS, la and lb, there is shown a 
simplified model of the vector space representation of mer- 
chants and consumers. The vector space 100 is shown here 
with only three axes, but in practice is a high dimensional 
hypersphere, typically having 100-300 components. In this 
vector space 100, each merchant is assigned a merchant 
vector. Preferably, the initial assignment of each merchant's 
vector contains essentially randomly valued vector 
components, to provide for a quasi-orthogonal distribution 
of merchant vectors. This means that initially, the merchant 
vectors are essentially perpendicular to each other, so that 
there is no predetermined or assumed association or simi- 
larity between merchants. 

In FIG. la, there is shown merchant vectors for five 
merchants, A, B, C, D, and E after initialization, and prior to 
being updated. Merchant A is an upscale clothing store, 
merchant B is a discount furniture store, merchant C is an 
upscale furniture store, merchant D is a discount clothing 
catalog outlet, and merchant E is a online store for fashion 
jewelry. As shown in FIG. lc, merchants A and D have the 
same SIC code because they are both clothing stores, and 
merchants B and C have the same SIC code because they are 
both furniture stores. In other words, the SIC codes do not 
distinguish between the types of consumers who frequent 
these stores. 

In FIG. lb, there is shown the same vector space 100 after 
consumer spending data has been processed according to the 
present invention to train the merchant vectors. The training 
of merchant vectors is based on co-occurrence of merchants 
in each consumer's transaction data. FIG. lc illustrates 
consumer transaction data 104 for two consumers, CI and 
C2. The transaction data for CI includes transactions 110 at 
merchants A, C, and E. In this example, the transaction at 
merchants A and C co-occur within a co-occurrence window 
108; likewise the transactions at merchants C and E co-occur 
within a separate co-occurrence window 108. The transac- 
tion data for C2 includes transactions 110 at merchants B 
and D, which also form a co-occurrence event. 

Merchants for whom transactions co-occur in a consum- 
er's spending data have their vectors updated to point more 
in the same direction in the vector space, that is making their 
respective vector component values more similar. 

Thus, in FIG. lb, following processing of the consumer 
transaction data, the merchant vectors for merchants A, C, 
and E have been updated, based on actual spending data, 
such as Cl's transactions, to point generally in the same 
direction, as have the merchant vectors for merchants B and 
D, based on C2's transactions. Clustering techniques are 
used then to identify clusters or segments of merchants 
based on their merchant vectors 402. In the example of FIG. 
lb, a merchant segment is defined to include merchants A, 
C, and E, such as "upscale-technology_savvy," Note that as 
defined above, the SIC codes of these merchants are entirely 
unrelated, and so SIC code analysis would not reveal this 
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group of merchants. Further, a different segment with mer- such as age range or gender, of the consumers. Geographic 

chants B and D is identified, even though the merchants information 208 uses the account data to describe the most 

share the same SIC codes with the merchants in the first common geographic location of transactions in the segment, 

segment, as shown in the transaction data 104. i n eacn portion of the segment description 210 one or more 

Each merchant segment is associated with a merchant 5 descriptors may be used (i.e. multiple major, minor, 
segment vector 105, preferably the centroid of the merchant demographic, or geographic descriptors). This naming con- 
cluster. Based on the types of merchants in the merchant vention is much more powerful and fine-grained than con- 
segment, and the consumers who have purchased in the ventional SIC classifications, and provides insights into not 
segment, a segment name can be defined, and may express just the industries of different merchants (as in SIC) but 
the industry, sub-industry, geography, and/or consumer more importantly, into the geographic, approximate age or 
demographics. gender, and lifestyle choices of consumers in each segment. 

The merchant segments provide very useful information The various types of segment reports are further described 

about the consumers. In FIG. lb there is shown the con- i n section I. Reporting Engine, below, 

sumer vectors 106 for consumers CI and C2. Each consum- B. System Overview 

er J s vector is a summary vector of the merchants at which 15 Turning now to FIG. 4a there is shown an illustration of 

the consumer shops. This summary is preferably the vector a system architecture of one embodiment of the present 

sum of merchant vectors at which the consumer has shopped invention during operation in a mode for predicting con- 

at in defined recent time interval. The vector sum can be sumer spending. System 400 includes begins with a data 

weighted by the recency of the purchases, their dollar preprocessing module 402, a data postprocessing module 

amount, or other factors. 20 410, a profiling engine 412, and a reporting engine 426. 

Being in the same vector space as the merchant vectors, Optional elements include a segment transition detection 

the consumer vectors 106 reveal the consumer's interests in engine 420 and a targeting engine 422. System 400 operates 

terms of their actual spending behavior. This information is on different types of data as inputs, including consumer 

by far a better base upon which to predict consumer spend- summary file 404 and consumer transaction file 406, gen- 

ing at merchants than superficial demographic labels or 25 erates interim models and data, including the consumer 

categories. Thus, consumer Cl's vector is very strongly profiles in profile database 414, merchant vectors 416, 

aligned [with the merchant vectors of merchants A, C, and E, merchant segment predictive models 418, and produces 

indicating CI is likely to be interested in the products and various useful outputs including various segment reports 

services of these merchants. CTs vector can be aligned with 428-432. 

these merchants, even if CI never purchased at any of them 30 FIG. 4b illustrates system 400 during operation in a 

before. Thus, merchants A, C, and E have a clear means for training mode, and here additionally include predictive 

identifying consumers who may be interested in purchasing model generation system 440. 

from them. C. Functional Overview 

Which consumers are associated with which merchant Referring now to FIG. 3, there is shown a functional 

segments can also be determined by a membership function. 35 overview of the processes supported by the present inven- 

This function can be based entirely on the merchant segment ii 0 n. The process flow illustrated and described here is 

vectors and the consumer vectors (e.g. dot product), or on exemplary of how the present invention may be used, but 

other quantifiable data, such as amount spent by a consumer does not limit the present invention to this exact process 

in each merchant segment, or a predicted amount to be spent. fl 0Wj ^ variants may be easily devised. 

Given the consumers who are members of a segment, 40 Generally then, master files 408 are created or updated 
useful statistics can be generated for the segment, such as 300 from account transaction data for a large collection of 
average amount spent, spending rate, ratios of how much consumers (account holders) of a financial institution, as 
these consumers spend in the segment compared with the ma y be stored in the consumer summary files 404 and the 
population average, and so forth. This information enables consumer transaction files 406. The master files 408 collect 
merchants to finely target and promote their products to the 45 and organize the transactions of each consumer from dis- 
appropriate consumers. ferent statement periods into a date ordered sequence of 

FIG. 2 illustrates portions of a sample index of merchant transaction data for each consumer. Processing of the master 

segments, as may be produced by the present invention. files 408 normalizes merchant names in the transaction data, 

Segments are named by assigning each segment a unique and generates frequency statistics on the frequency of occur- 

segment number 200 between 1 and M the total number of 50 rence of merchant names. 

segments. In addition, each segment has a description field i n a training mo de, the present invention creates or 

210 which describes the merchant segment. A preferred updates 302 merchant vectors associated with the merchant 

description field is of the form: names. The merchant vectors are based on the co-occurrence 

of merchants names in defined co-occurrence windows 

Major Calorics: Minor Categories: Demographics: Geography „ (such ^ a oumber Qf KansZCliotlS OT period of time). 

Major categories 202 describe how the customers in a Co-occurrence statistics are used to derive measures of how 

merchant segment typically use their accounts. Uses include closely related any two merchants are based on their fre- 

retail purchases, direct marketing purchases, and where this quencies of co-occurrence with each other, and with other 

type cannot be determined, then other major categories, such merchants. The relationship measures in turn influence the 

as travel uses, educational uses, services, and the like. Minor 60 positioning of merchant vectors in the vector space so that 

categories 204 describe both a subtype of the major category merchants who frequently co-occur have vectors which are 

(e.g. subscriptions being a subtype of direct marketing) or similarly oriented in the vector space, and the degree of 

the products or services purchased in the transactions (e.g. similarity of the merchant vectors is a function of their 

housewares, sporting goods, furniture) commonly purchased co-occurrence rate. 

in the segment. Demographics information 206 uses account 65 The merchant vectors are then clustered 304 into raer- 

data from the consumers who frequent this segment to chant segments. The merchant segments generally describe 

describe the most frequent or average demographic features, groups of merchants which are naturally (in the data) 
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shopped at "together" based on the transactions of the many ciated with which segments, an analysis of the rates and 

consumers. Each merchant segment has a segment vector volumes of different types of transactions by consumers in 

computed for it, which is a summary (e.g. centroid) of the each segment can be generated. Further, targeting of 

merchant vectors in the merchant segment. Merchant seg- accounts in one or more segments may be used to selectively 

ments provide very rich information about the merchants 5 identify populations of consumers with predicted high dollar 

that are members of the segments, including statistics on amount or transaction rates. Account analysis also identifies 

rates and volumes of transactions, purchases, and the like. consumers who have transitioned between segments as 

With the merchant segments now defined, a predictive indicated by increased or decreased membership values, 

model of spending behavior is created 306 for each mer- Using targeting criteria, promotions directed 314 to spe- 

chant segment. The predictive model for each segment is 10 cific consumers in specific segments and the merchants in 

derived from observations of consumer transactions in two those segments can be realized. For example, given a 

time periods: an input time window and a subsequent merchant segment, the consumers with the highest levels (or 

prediction time window. Data from transactions in the input rankings) of predicted spending in the segment may be 

time window for each consumer (including both segment identified, or the consumers having consumer vectors closest 

specific and cross-segment) is used to extract independent is to the segment vector may be selected. Or, the consumers 

variables, and actual spending in the prediction window who have highest levels of increased membership in a 

provides the dependent variable. The independent variables segment may be selected. The merchants which make up the 

typically describe the rate, frequency, and monetary amounts segment are known from the segment clustering 304. One or 

of spending in all segments and in the segment being more promotional offers specific to merchants in the seg- 

modeled. A consumer vector derived from the consumer's 20 ment can be created, such as discounts, incentives and the 

transactions may also be used. Validation and analysis of the like. The merchant-specific promotional offers are then 

segment predictive models may done to confirm the perfor- directed to the selected consumers. Since these account 

mance of the models. holders have been identified as having the greatest likeli- 

In the production phase, the system is used to predict hood of spending in the segment, the promotional offers 

spending, either in future time periods for which there is no 25 beneficially coincide with their predicted spending behavior, 

actual data as of yet, or in a recent past time period for which This desirably results in an increase success rate at which the 

data is available and which is used for retrospective analysis. promotional offers are redeemed. 

Generally, each account (or coasumer) has a profile sum- These and other uses and applications of the present 

marizing the transactional behavior of the account holder. invention will be apparent to those of skill in the art. 

This information is created, or updated 308 with recent 30 D. Data Preprocessing Module 

transaction data if present, to generate the appropriate vari- The data preprocessing module 402 (DPM) does initial 

ables for input into the predictive models for the segments. processing of consumer data received from a source of 

(Generation of the independent variables for model genera- consumer accounts and transactions, such as a credit card 

tion may also involve updating 308 of account profiles.) issuer, in preparation for creating the merchant vectors, 

Each account further includes a consumer vector which is 35 consumer vectors, and merchant segment predictive models, 

derived, e.g. as a summary vector, from the merchant vectors DPM 402 is used in both production and training modes. (In 

of the merchant at which the consumer has purchased in a this disclosure, the terms "consumer," "customer," and 

defined time period, say the last three months. Each mer- "account holder** are used interchangeably), 

chant vector's contribution to the consumer vector can be The inputs for the DPM are the consumer summary file 

weighted by the consumer's transactions at the merchants, 40 404 and the consumer transaction file 406. Generally, the 

such as by transaction amounts, rates, or recency. The consumer summary file 404 provides account data on each 

consumer vectors, in conjunction with the merchant segment consumer who transaction data is to be processed, such as 

vectors provide an initial level of predictive power. Each account number and other account identifying and descrip- 

consumer can now be associated with the merchant segment tive information. The consumer transaction file 406 provides 

having a merchant segment vector closest to the consumer 45 details of each consumer's transactions. The DPM 402 

vector for the consumer. processes these files to organize both sets of data by account 

Using the updated account profiles, this data is input into identifiers of the consumer accounts, and merges the data 

the set of predictive models to generate 310 for each files so that each consumer's summary data is available with 

consumer, an amount of predicted spending in each mer- their transactions. 

chant segment in a desired prediction time period. For 50 Customer summary file 404: The customer summary file 

example, the predictive models may be trained on a six 404 contains one record for each customer that is profiled by 

month input window to predict spending in a subsequent the system, and includes account information of the cus- 

three month prediction window. The predicted period may tomer's account, and optionally includes demographic infor- 

be an actual future period or a current (e.g. recently ended) mation about the customer. The consumer summary file 404 

period for which actual spending is available. 55 is typically one that a financial institution, such as a bank, 

The predicted spending levels and consumer profiles credit card issuer, department store, and the like maintains 

allow for various levels and types of account and segment on each consumer. The customer or the financial institution 

analysis 312. First, each account may be analyzed to deter- may supply the additional demographic fields which arc 

mine which segment (or segments) the account is a member deemed to be of informational or of predictive value, 

of, based on various membership functions. A preferred 60 Examples of demographic fields include age, gender and 

membership function is the predicted spending value, so that income; other demographic fields may be provided, as 

each consumer is a member of the segment for which they desired by the financial institution. 

have the highest predicted spending. Other measures of Table 1 describes one set of fields for the customer 

association between accounts and segments may be based on summary file 404 for a preferred embodiment. Most fields 

percentile rankings of each consumer's predicted spending 65 are self-explanatory. The only required field is an account 

across the various merchant segments. With any of these (or identifier that uniquely identifies each consumer account and 

similar) methods of determining which consumers are asso- transactions. This account identifier may be the same as the 
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consumer's account number, however, it is preferable to 
have a different identifier used, since a consumer may have 
multiple account relationships with the financial institution 
(e.g. multiple credit cards or bank accounts), and all trans- 
actions of the consumer should be dealt with together. The 5 
account identifier is preferably derived from the account 
number, such as by a one-way hash or encrypted value, such 
that each account identifier is uniquely associated with an 
account number. The pop__id field is optionally used to 
segment the population of customers into arbitrary distinct no 
populations as specified by the financial institution, for 
example by payment history, account type, geographic 
region, etc. 



TABLE 1 



Description 



Customer Summary File 

Sample Format 



Account id 


Charfmax 24] 


Pop_id 


Char Or-'N*) 


Account number 


Char[max 16] 


Credit bureau score 


Short int as string 


Internal credit risk 


Short int as string 


score 




Ytd purchases 


Int as string 


Ytd_ cash adv 


Int as string 


Ytd_int purchases 


Int as string 


Ytd int cash adv 


Int as string 


State code 


Char[max 2] 


Zip_code 


Charfmax 5] 


Demographic 1 


Int as string 


Demographic N 


Int as string 



Note the additional, optional demographic fields for con- 
taining demographic information about each consumer. In 
addition to demographic information, various summary sta- 
tistics of the consumer's account may be included. These 
include any of the following: 



TABLE 2 



Example Demographic Fields for Customer Summary File 



Description 



Explanation 



Cardholder zip code 

Months on books or open date 

Number of people on the account 

Credit risk score 

Cycles delinquent 

Credit line 

Open to buy 

Initial month statement balance 
Last month statement balance 
Monthly payment amount 

Monthly cash advance amount 

Monthly cash advance count 

Monthly purchase amount 

Monthly purchase count 



Equivalent to number of plastics 



Balance on the account prior to the 

first month of transaction data pull 

Balance on the account at the end of 

the transaction data pulled 

For each month of transaction data 

contributed or the average over last 

year. 

For each month of transaction data 
contributed or the average over last 
year. 

For each month of transaction data 
contributed or the average over last 
year. 

For each month of transaction data 
contributed or the average over last 
year. 

For each month of transaction data 
contributed or the average over last 
year. 
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TABLE 2-continued 



Example Demographic Fields for Customer Summary File 



Description 



Explanation 



Monthly cash advance interest 



Monthly purchase interest 



Monthly late charge 



For each month of transaction data 
contributed or the average over last 
year. 

For each month of transaction data 
contributed or the average over last 
year. 

For each month of transaction data 
contributed or the average over last 
year. 



15 

Consumer transaction file 406. The consumer transaction 
file 406 contains transaction level data for the consumers in 

the consumer summary file. The shared key is the account 

id. In a preferred embodiment, the transaction file has the 
20 following description. 

TABLE 3 



25 



30 



35 



Description 



Consumer Transaction File 



Sample Format 



Account_id 


Quoted char(24) - [0-9] 


Account__number 


Quoted char(16) - [0-9] 


Pop„id 


Quoted char(l) - [0-128] 


Transaction code 


Integer 


Transaction_amount 


Float 


Transaction_time 


HH:MM:SS 


Transaction_date 


YYYYMMDD 


Transaction type 


Char(5) 


SIC code 


Char(5> [0-9] 


Merchant descriptor 


Char(25) 


SKU Number 


Variable length list 


Merchant zip code 


Charfmax 5] 



The SKU and merchant zip code data are optional, and 
may be used for more fine-grained filtering of which trans- 
actions are considered as co-occurring. 

The output for the DPM is the collection of master files 
408 containing a merged file of the account information and 
transaction information for each consumer. The master file is 
generated as a preprocessing step before inputting data to the 
profiling engine 412. The master file 408 is essentially the 
customer summary file 404 with the consumer's transactions 
appended to the end of each consumer's account record. 
Hence the master file has variable length records. The master 
files 408 are preferably stored in a database format allowing 
for SQL querying. There is one record per account identifier. 

In a preferred embodiment, the master files 408 have the 
following information: 



55 



60 



65 



TABLE 4 



Master File 408 



Description 



Sample Format 



Account id 
Pop_id 

Account number 
Credit bureau score 
Ytd purchases 
Ytd_cash_adva nces 
Ytd_interest on purchases 
Ytd_LDierest on cash advs 
State__code 



Charf max 24] 
Char Cr-'N') 
Chart max 16] 
Short int as string 
Int as string 
Int as string 
Int as string 
Int as string 
Charf max 2] 
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Master File 408 


Description 


Sample Format 


Dcmographic_l 


Inl as string 


Demographic N 


I m as string 


<traiisactions> 
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predictions of future spending by a consumer in each 
merchant segment using inputs from the data postprocessing 
module 410. 

FIG. 5 illustrates one embodiment of the predictive model 
generation system 440 that includes three modules: a mer- 
chant vector generation module 510, a clustering module 
520, and a predictive model generation module 530. 
1. Merchant Vector Generation 

Merchant vector generation is application of a context 
vector type analysis to the account data of the consumers, 
and more particularly to the master files 408. The operations 
for merchant vector generation are managed by the merchant 
vector generation module 510. 
The transactions included for each consumer include the i n orf j er to obtain the initial merchant vectors, additional 
various data fields described above, and any other per- processing of the master files 408 precedes the analysis of 
transaction optional data that the financial institution desires is wmcn merchants co-occur in the master files 408. There are 
to track. sequential, processes that are used on the merchant 

The master file 408 preferably includes a header that descriptions, stemming and equivalencing. These operations 
indicates last update and number of updates. The master file normalize variations of individual merchants names to a 
may be incrementally updated with new customers and new smg i e common merchant name to allow for consistent 
transactions for existing customers. The master file database 2 o identification of transaction at the merchant. This processing 
is preferably be updated on a monthly basis to capture new ^ manage d by the vector generation module 510. 
transactions by the financial institution's consumers. Stemming is the process of removing extraneous charac- 

The DPM 402 creates the master file 408 from the ters from the merchant descriptions. Examples of extraneous 
consumer summary file 404 and consumer transaction file characters include punctuation and trailing numbers. Trail- 
406 by the following process: 25 mg numbers are rem0 ved because they usually indicate the 

a) Verify minimum data requirements. The DPM 402 particular store in a large chain (e.g. Wal-Mart #12345). It is 
determines the number of data files it is handling (since preferable to identify all the outlets of a particular chain of 
there maybe many physical media sources), and the stores as a single merchant description. Stemming optionally 
length of the files to determine the number of accounts converts all letters to lower case, and replaces all space 
and transactions. Preferably, a minimum of 12 months 30 characters with a dash. This causes all merchant descriptions 
of transactions for a minimum of 2 million accounts are to fc^ an unbroken string of non-space characters. The lower 
used to provide fully robust models of merchants and case constraint has the advantage of making it easy to 
segments. However, there is no formal lower bound to distinguish non-stemmed merchant descriptions from 
the amount of data on which system 400 may operate. stemmed descriptions. 

b) Data cleaning. The DPM 402 verifies valid data fields, 35 Equivalencing is applied after stemming, and identifies 
and discards invalid records. Invalid records are various different spellings of a particular merchant's 
records that are missing the any of the required fields description as being associated with a single merchant 
for the customer summary file of the transaction file. description. For example, the "Roto-Rooter" company may 
The DPM 402 also indicates missing values for fields occur in the transaction data with the following three 
that have corrupt or missing data and are optional. 40 stemmed merchant descriptions: "ROTO-ROOTER- 
Duplicate transactions are eliminated using account ID, SEWER-SERV", "ROTO-ROOTER-servicE", and -roto- 

aCCOUnt number, transaction COde, transaction amount, ROOTER-SEWER-DR". An equivalence table is set up containing a root 

date, and merchant description as a key. name and a list of all equivalent names. la this example, ROTO-ROOrER-SEWER- 

c) Sort and merge files. The consumer summary file 404 serv becomes the root name, and the latter two of these descriptions are listed 
and the consumer transaction file 406 are both sorted by 45 as equivalents. During operation, such as generation of subsequent master 
account ID; the consumer transaction file 406 is further files 408 (e.g. the next monthly update), an identified equivalenced name is 
sorted by transaction date. Additional sorting of the replaced with its root name from the equivalence table. 

transaction file, for example on time, type of In one embodiment, equivalencing proceeds in two steps, 

transaction, merchant zip code, may be applied to with an optional third step. The first equivalencing step uses 

further influence the determination of merchant 50 a fuzzy trigram matching algorithm that attempts to find 

co-occurrence. The sorted files are merged into the merchant descriptions with nearly identical spellings. This 

master file 408, with one record per account, as method collects statistics on all the trigrams (sets of three 

described above. consecutive letters in a word) in all the merchant 

Due to the large volume of data involved in this stage, descriptions, and maintains a list of the trigrams in each 

compression of the master files 408 is preferred, where 55 merchant description. The method then determines a close - 

on-the-fly compression and decompression is supported. ness score for any two merchant names that are supplied for 

This often improves system performance due to decreased comparison, based on the number of trigrams the merchant 

I/O. In addition, as illustrated in FIG. 4a, the master file 408 names have in common. If the two merchant names are 

may be split into multiple subfiles, such as splitting by scored as being sufficiently close, they are equivalenced. 

population ID, or other variable, again to reduce the amount 60 Appendix I, below, provides a novel trigram matching 

of data being handled at any one time. algorithm useful for equivalencing merchant names (and 

E. Predictive Model Generation System other strings). This algorithm uses a vector representation of 

Referring to FIG. 46, the predictive model generation each trigram, based on trigram frequency in data set, to 

system 440 takes as its inputs the master file 408 and creates construct trigram vectors, and judges closeness based on 

the consumer profiles and consumer vectors, the merchant 65 vector dot products. 

vectors and merchant segments, and the segment predictive Preferably, equivalencing is applied only to merchants 

models. This data is used by the profiling engine to generate that are assigned the same SIC code. This constraint is useful 
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since two merchants may have a similar name, but if they are merchants, and a total number of co-occurrence events. The 

in different SIC classifications there is a good chance that actual number of co-occurrences of a pair of merchants is 

they are, in fact, different businesses. determined. If a pair of merchants co-occur more frequently 

The second equivalencing step consists of fixing a group meQ expected, then the merchants are positively related, and 

of special cases. These special cases are identified as expe- 5 me strength of that relationship is a function of the "unex- 

rience is gained with the particular set of transaction data pected" amount of co-occurrence. If the pair of merchants 

being processed. There are two broad classes that cover most co-occur less frequently then expected, then the merchants 

of these special cases: a place name is used instead of a ™ ^gatively related. If a pair of merchants co-occur in the 

number to identify specific outlets in a chain of stores, and data about the same as expected, then there is no generally 

some department stores append the name of the specific 10 them. Using the relationship sU^ngths 

. _ r t . t . rfu l ■ a i r.r ^ . of each pair of merchants as the desired dot product between 

department to the name of the chain. An example of the first ^ yect lhe va , ues Qf ^ vectofs can 

case is U-Haul, where stemmed descriptions look like be determined in lhe vector TOs process fe the basis 

u-haul-san-diego, U-Haul- Atlanta, and the like. An 0 f the Unexpected Deviation Learning algorithm or «UDL». 

example of the second case is Robinsons-May department This approach overcomes the problems associated with 

stores, with stemmed descriptions like ROBINSONMAY- is conventional vector based models of representation, which 

LEE-WOMEN, ROBINSONMAY- LEVI-SHORT, tenc j t0 De based on overall frequencies of terms relative to 

ROBINSONMAY-TRIFARI-CO, and ROBINSONMAY- lhe database as a whole. Specifically, in a conventional 

JANE-ASHLE. In both cases, any merchant description in model, the high frequency merchants, that is merchants for 

the correct SIC codes that contain the root name (e.g. which there are many, many purchases, would co-occur with 

u-HAUL or robinsonmay) are equivalenced to the root name. 20 many other merchants, and either falsely suggest that these 

A third, optional step includes a manual inspection and other merchants are related to the high frequency merchants, 

correction of the descriptions for the highest frequency or simply be so heavily down-weighted as to have very little 

merchants. The number of merchants subjected to this influence at all. That is, a high frequency merchant names 

inspection varies, depending upon the time constraints in the would be treated as high frequency English words like "the" 

processing stream. This step catches the cases that are not 25 and " and "> and so forth > wnich given very low weights 

amenable to the two previous steps. An example is Microsoft conventional vector systems specifically because of their 

Network, with merchant descriptions like microsoft-net m S*J frequency- ..... 

and msn-billing. With enough examples from the transac- f However, the present invention takes account of the high 

tion data, these merchant descriptors can also be added to the frequency presence of individual merchants and instead 

special cases in step two, above. 30 J^f S 106 ex P ecle ? r f at which ™? dl * ms ' mcl ^? g 

n_ c t_i .1 . ' c . £i Ano • j high frequency merchants, co-occur with other merchants. 

Preferably,atleastonesetofmasterfilM408«generat e d H f fa fr ^ ue J merchants are ted t0 co ^ ccur more 

before the equivalencing is determined. This is desirable in fr tl If , bi b fr merchant and another mer . 

order to compile stances on frequencies of each merchant cban , cven mofe f tl than ted tben 

description within each SIC code before the equivalencing is lhere ^ a positive correlation between them . ^ present 

s a _ ' • , ■ ... . . ... 35 invention thus accounts for the high frequency merchants in 

Once he equivalencing table is constructed, the original a mannef ^ conventional methodo i ogies cannot, 

master files 408 are re-built using the eqmva lenced merchant ^ ovefall ^ of modeli the merchant y( ; ctors 

descriptions. This steps replaces all equivalenced merchant usj u cted devialion ^ M follows: 

descriptors with their associated root names, thereby ensur- L First> co(mt ^ number rf limes ^ , he mercbants 

ing that all transactions for the merchant are associated with 40 ^ Qne anQther fa ^ data ^ 

the same merchant descriptor. Subsequent incoming trans- intujtion js that re , ated mercbants occur t mer oft 

action data can be equivalenced before it is added to the whereas unre , ated mercbants do nol ^ , mer oflen 

master files, using the original equivalence table. 2 N (he relationshi s , th between mer . 

Given the equivalence table, a merchant descriptor ire- .„ ™„u *ul ~u™J7,^ 

^ , , ' „ . r „ chants based on now much the observed co-occurrence 

quency list can be determined describing the frequency of 45 deyiatcd from me ^ ^ relation _ 

occurrence of each merchant descriptor (including its shi lh has me foUowin characteristics: 

equivalents). „ , • • - c , 

~ . , . ui • j * j • •** i i. . Two merchants that co-occur significantly more often 

Once the equivalence table is defined an initial merchant , , . . . , , 

vector is assigned to each root name. The merchant vector ^ than ^ are lively reUtod to one another, 

training based on co-occurrence is then performed, process- so Two merchants cooccur significantly less often than 

ing the master files by account ID and tben by date as are ne Safvely K ^<* '° °« another, 

described above. Two merchants that co-occur about the number of times 

2. Training of Merchant Vectors: The UDL Algorithm expected are not related. 

As noted above, the merchant vectors are based on the 3 - Ma P the relationship strength onto vector space; that is, 
co-occurrence of merchants in each consumer's transaction 55 determine the desired dot product between the merchant 
data. The master files 408, which are ordered by account and vectors for all pairs of items given their relationship 
within account by transaction date, are processed by strength. The mapping results in the following character- 
account, and then in date order to identify groups of istics: 

co-occurring merchants. The co-occurrence of merchant The merchant vectors for positively related merchants 

names (once equivalenced) is the basis of updating the 60 hav e a positive dot product. 

values of the merchant vectors. The merchant vectors for negatively related merchants 

The training of merchant vectors is based upon the have a negative dot product, 

unexpected deviation of co-occurrences of merchants in The merchant vectors for unrelated merchants have a zero 

transactions. More particularly, an expected rate at which dot product. 

any pair of merchants co-occur in the transaction data is 65 4. Update the merchant vectors from their initial 

estimated based upon the frequency with which each inch- assignments, so that the dot products between them at 

vidual merchant appears in co-occurrence with any other least closely approximate the desired dot products. 
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The next sections explain this process in further detail, 
a) Co-occurrence Counting 

Co-occurrence counting is the procedure of counting the 
number of times that two items, here merchant descriptions, 
co-occur within a fixed size co-occurrence window in some 
set of data, here the transactions of the consumers. Counting 
can be done forwards, backwards, or bi-directionally. The 
best way to illustrate co-occurrence counting is to give an 
example for each type of co-occurrence count: 

Example: Consider the sequence of merchant names: 
Ml M3 Ml M3 M3 M2 M3 
where Ml, M2 and M3 stands for arbitrary merchant names 
as they might appear in a sequence of transactions by a 
consumer. For the purposes of this example, intervening 
data, such dates of transactions, amounts, transaction 
identifiers, and the like, are ignored. Further assume a 
co-occurrence window with a size=3. Here, the 
co-occurrence window is based on a simple count of items 
or transactions, and thus the co-occurrence window repre- 
sents a group of three transactions in sequence. 

i) Forward Co-occurrence Counting 

The first step in the counting process is to set up the 
forward co-occurrence windows. FIG. 6a illustrates the 
co-occurrence windows 602 for forward co-occurrence 
counting of this sequence of merchant names. By definition, 25 counting. 



co-occurrence event is the combination (in any order) of two 
merchant names. Table 6 shows this tabulation in matrix 
form. The rows indicate the targets and the columns indicate 
the neighbors. For future reference, this matrix will be called 
the forward co-occurrence matrix. 

TABLE 6 

Forward Co-occurrence matrix 



10 


Target 


Ml 


Neighbor 
M2 


M3 






Ml 


1 


1 


4 


6 




M2 


0 


0 


1 


1 


15 


M3 


1 


2 


5 


8 






2 


3 


10 


15 



ii) Backward Co-occurrence Counting 
Backward co-occurrence counting is done in the same 
manner as forward co-occurrence counting except that the 
neighbors precede the target in the co-occurrence windows. 
FIG. 6b illustrates the co-occurrence windows for the same 
sequence of merchant names for backward co-occurrence 



each merchant name is a target 604, indicated by an arrow, 
for one and only one co-occurrence window 602. Therefore, 
in this example there are seven forward co-occurrence 
windows 602, labeled 1 through 7. The other merchant 
names within a given co-occurrence window 602 are called 30 
the neighbors 606. In forward co-occurrence counting, the 
neighbors occur after the target. For window size =3 there 
can be at most three neighbors 606 within a given 
co-occurrence window 602. Obviously, the larger the win- 
dow size, the more merchants (and transactions) are deemed 35 
to co-occur at a time. 

• The next step is to build a table containing all 
co-occurrence events. A co-occurrence event is simply a 
pairing of a target 604 with a neighbor 606. For the 
co-occurrence window #1 in FIG. 6a, the target is Ml and 40 
the neighbors are M3, Ml, and M3. Therefore, the 
co-occurrence events in this window are: (Ml, M3), (Ml, 
Ml), and (Ml, M3). Table 5 contains the complete listing of 
co-occurrence events for every co-occurrence window in 
this example. 



Once the co-occurrence windows are specified, the 
co-occurrence events can be identified and counted. 

TABLE 7 

Backward co-occurrence event table 
Co-occurrence 



TABLE 5 



Forward co-occurrence event table 



Co-occurrence 



45 



50 



Window 


Target 


Neighbor 


1 


M3 


M2 


1 


M3 


M3 


1 


M3 


M3 


2 


M2 


M3 


2 


M2 


M3 


2 


M2 


Ml 


3 


M3 


M3 


3 


M3 


Ml 


3 


M3 


M3 


4 


M3 


Ml 


4 


M3 


M3 


4 


M3 


Ml 


5 


Ml 


M3 


5 


Ml 


Ml 


6 


M3 


Ml 



The number of times that each unique co-occurrence 
event occurred is then recorded in the backward 
co-occurrence matrix. 



Window 


Target 


Neighbor 


1 


Ml 


M3 


1 


Ml 


Ml 


1 


Ml 


M3 


2 


M3 


Ml 


2 


M3 


M3 


2 


M3 


M3 


3 


Ml 


M3 


3 


Ml 


M3 


3 


Ml 


M2 


4 


M3 


M3 


4 


M3 


M2 


4 


M3 


M3 


5 


M3 


M2 


5 


M3 


M3 


6 


M2 


M3 



Backward Co-occurrence matrix 



55 



60 



Target 



Ml 



Neighbor 



M2 



M3 



Ml 


1 


0 


4 


2 


M2 


1 


0 


2 


3 


M3 


4 


1 


5 


10 




6 


1 


8 


15 



The last step is to tabulate the number of limes that each 
unique co-occurrence event occurred. A unique 



Note that the forward co-occurrence matrix and the back- 
65 ward co-occurrence matrix are the transpose of one another. 
This relationship is intuitive, because backward 
co-occurrence counting is the same as forward 
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co-occurrence counting with the transaction stream sequence. In an alternate embodiment the co-occurrence 

reversed. Thus, there is no need to do both counts; either window is time-based using a date range in order to identify 

count can be used, and then the transpose the resulting co-occurring events. For example, with a co-occurrence 

co-occurrence matrix taken to obtain get the other. window of 1 week, given a target transaction, a co-occurring 

iii) Bi-directional Co-occurrence Counting 5 neighbor transaction occurs within one week of the target 

The bi-directional co-occurrence matrix is just the sum of transaction. Yet another date approach is to define the target 

the forward co-occurrence matrix and the backward not as a transaction, but rather as a target time period, and 

co-occurrence matrix. The resulting matrix will always be then the co-occurrence window as another time period. For 

symmetric. In other words, the co-occurrence between mer- example, the target period can be a three month block and so 

chant names A and B is the same as the co-occurrence 10 all transactions within the block are the targets, and then the 

between merchant names B and A. This property is desirable co-occurrence window may be all transactions in the two 

because this same symmetry is inherent in vector space; that months following the target period. Thus, each merchant 

is for merchant vectors and V B for merchants A and B, havin S a transaction in the target period co-occurs with each 
_^ _^ merchant (same or other) having a transaction in the 

VvV s =V B 'V A . For this reason, the preferred emb^ is co-occurrence period. Those of skill in the art can readily 
uses the bi-directional co-occurrence matrix. devise a i ter nate co-occurrence definitions which capture the 

sequence and/or time related principles of co-occurrence in 

accordance with the present invention. 

b) Estimating Expected Co-occurrence Counts 
20 In order to determine whether two merchants are related, 

Neighbor the UDL algorithm uses an estimate about the number of 

times transactions at such merchants would be expected to 
occur. Suppose the only information known about transac- 
tion data is the number of times that each merchant name 
25 appeared in co-occurrence events. Given no additional 
information, the correlation between any two merchant 
names, that is how strongly they are related, cannot be 
determined. In other words, we would be unable to deter- 
mine whether the occurrence of a transaction at one mer- 
FIGS, la and lb illustrate the above concepts in the 30 chant increases or decreases the likelihood of occurrence of 
context of consumer transaction data in the master files 408. a transaction at another merchant. 

In FIG. la there is shown a portion of the master file 408 Now suppose t hat it is desired predict the number of times 
containing transactions of a particular customer This data is ^ arbitrary merchants, merchant,- and merchant- co-occur, 
prior to the stemming and equivalencing steps described In the absen ce of any additional information we would have 
above, and so includes the original names of the merchants 35 t0 assume that merchant, and merchant, are not correlated, 
with spaces, store numbers and locations and other extra- In term s of probability theory, this means that the occurrence 

of a transaction at merchant,- will not affect the probability of 



Bi-directional Co-occurrence matrix 



Target 


Ml 


M2 


M3 




Ml 


2 


1 


5 


S 


M2 


1 


0 


3 


4 


M3 


5 


3 


10 


18 




8 


4 


18 


30 



neous data. 



FIG. lb illustrates the same data after stemming and t he occurrence of a transaction at merchant, : 
equivalencing. Notice that the two transactions at STAPLES 

which previously identified a store number are now equiva- ^ p /t p j Ul 
lenced. The two car rental transactions at ALAMO which 

transactions previously included the location are equiva- The joint probability of merchant ■ and merchant,- is given 

lenced to ALAMO, as are two hotel stays at HILTON which by 
also previously included the hotel location. Further note that 

the HILTON transactions specified the location prior to the 45 

hotel mme. Finally the two transactions at NORDSTROMS Substituting P, for R 1( - into equation [2] gives 

which previously identified a department have been equiva- ' 
lenced to the store name itself. 

Further, a single forward co-occurrence window 700 is /^-/y^-zy^ [3] 

shown with the target 702 being the first transaction at the 50 

HILTON, and the next three transactions being neighbors However, the true probabilities P f - and P /( are unknown, 

704 and so they must be estimated from the limited information 

Accordingly, following the updating of the master files g* ven ao °ut lne data - In mis scenario, the maximum likeli- 

408 with the stemmed and equivalenced names, the mer- hood estimate P for P, and P y is 

chant vector generation module 510 performs the following 55 * 

steps for each consumer account: r 1 J 

1. Read the transaction data in date order. P^t/t [5] 

2. Forward count the co-occurrences of merchant names 

in the transaction data, using a predetermined w ere 

co-occurrence window. 60 * s me number of co-occurrence events that merchant,- 

3. Generate the forward co-occurrence, backward appeared in, 

co-occurrence and bi-directional co-occurrence T,- is the number of co-occurrence events that merchant,. 

matrixes. appeared in, and 

One preferred embodiment uses a co-occurrence window T is the total number of co-occurrence -events, 

size of three transactions. This captures the transactions as 65 These data values are taken from the bi-directional 

the co-occurring events (and not the presence of merchant co-occurrence matrix. Substituting these estimates into 

names within three words of each other) based only on equation [3] produces 
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P^PrP^X/T 2 

which is the estimate for P,y. 

Since there are a total of T independent co-occurrence 
events in the transaction data, the expected number of 5 
co-occurring transactions of merchant and merchant^ is 
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-continued 



kl{T-k)\ 



This distribution has mean: 



which is the same value as was previously estimated using 
a different approach. The distribution has variance: 



Var(, tf ] = ^(1-^=^.(1.^) 



The variance is used indirectly in UDL1, below. The 
standard deviation of t,y, o iJt is the square root of the variance 
Var[i,y]. If merchant, and merchant,- are not related, the 
difference between the actual and expected co-occurrence 
counts, T I - / -'t I y, should not be much larger than a £ j. 

c) Desired Dot-Products Between Merchant Vectors 

To calculate the desired dot product (d,y) between two 
merchants vectors, the UDL algorithm compares the number 
of observed co-occurrences (found in the bidirectional 
co-occurrence matrix) to the number of expected 
co-occurrences. First, it calculates a raw relationship mea- 
sure (r ( y) from the co-occurrence counts, and then it calcu- 
lates a desired dot product from r,y. There are at least three 
different ways that the relationship strength and desired dot 
product can be calculated from the cooccurrence data: 

Method: UDL1 



[7] 



Method: UDL2 



This expected value serves as a reference point for deter- 
mining the correlation between any two merchants in the 
transaction data. If two merchants co-occur significantly 
greater than expected by t,y, the two merchants are posi- 
tively related. Similarly, if two merchants co-occur signifi- 
cantly less than expected, the two merchants are negatively 
related. Otherwise, the two merchants are practically unre- 
lated. 

Also, given the joint probability estimate P {J and the 
number of independent co-occurrence events T, the esti- 
mated probability distribution function for the number of 
times that merchant,, and merchanty co-occur can be deter- 
mined. It is well known, from probability theory, that an 
experiment having T independent trials (here transactions) 
and a probability of success P tj . for each trial (success here 
being co-occurrence of merchant,- and merchanty) can be 
modeled using the binomial distribution. The total number 
of successes k, which in this case represents the number of 
co-occurrences of merchants, has the following probability 
distribution: 



Method: UDL3 
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/V=sign(7y-7V). J-^L =sign(Ty-fy)- 



[12] 



[13] 



[12] 



[14] 



[12] 
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where T,y is the actual number of co-occurrence events for 
merchant, and merchanty, and a r is the standard deviation of 
all the r, y . 

In UDL2 and UDL3, the log-likelihood ratio, Ink is given 
by: 



[8] 



30 



\nX=T u in^ + (T i -T u )\n i ^^^ 



[15] 



(7-j-7--)ln 



(Tj-Tjj) 



(T-Ti-Tj + Tij)]* 



35 



[T-Ti-Tj + t u ) 



[9] 



Each technique calculates the unexpected deviation, that 
is, the deviation of the actual co-occurrence count from the 
expected co-occurrence count. In terms of the previously 
40 defined variables, the unexpected deviation is: 



[10] 



[16] 



50 



Thus, D iy may be understood as a raw measure of unexpected 
deviation. 

As each method uses the same unexpected deviation 
measure, the only difference between each technique is that 
they use different formulas to calculate r, y from D,y. (Note 
that other calculations of dot product may be used). 

The first technique, UDL1, defines r, y to be the unexpected 
deviation D, y divided by the standard deviation of the 
predicted co-occurrence count. This formula for the rela- 
tionship measure is closely related to chi-squared (x% a 
significance measure commonly used by statisticians. In fact 



55 



(HI 



■ v v 



Tutu 



60 For small counts situations, i.e. when T t -«1, UDL1 gives 
overly large values for r,y. For example, In a typical retail 
transaction data set, which has more than 90% small counts, 
values of r,y on the order of 10 9 have been seen. Data sets 
[l 1 1 having such a high percentage of large relationship measures 
65 can be problematic; because in these cases, a r also becomes 
very large. Since the same o r is used by all co-occurrence 
pairs, large values of o r causes 
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r U , s x (19] 

to become very small for pairs that do not suffer from small 5 . , „, „ . , , 

mimlp -r. „r • t , ^ . , _ In a typical master file 408 of typical transaction data, the 

counts. Ineretore in these cases d. v becomes * . • . j 

y set of merchants vectors contains ten thousand or more 



vectors. This means that if it desired to find the optimal 
dij - tanh(^^) = 0 J 18 ' solution, then there must be solved a system of ten thousand 

10 or more high-dimensional linear equations. This calculation 
is normally prohibitive given the types of time frames in 
This property is not desirable, because it forces the which me information is desired. Therefore, alternative 
merchant vectors of two merchants too be orthogonal, even techniques for minimizing the cost function are preferred, 
when the two merchants co-occur significantly greater than „ ? ne such approach is based on gradient descent. In this 
expected technique, the desired dot product is compared to the actual 

dot product for each pair of merchant vectors. If the dot 
The second technique, UDL2, overcomes of the small pro duct between a pair of vectors is less than desired, the 
count problem by using log-likelihood ratio estimates to ^ 0 vectors are moved closer together. If the dot product 
calculate r, y . It has been shown that log-likelihood ratios between a pair of vectors is greater than desired, the two 
have much better small count behavior than x 2 , while at the 20 vectors are moved farther apart. Written in terms of vector 
same time retaining the same behavior as y w 2 in the non-small equations, this update rule is: 
count regions. 

The third technique, UDL3, is a slightly modified version v t {n + 1) = v,(n) + a(d u - v : ■ Vj)Vj M 

of UDL2. The only difference is that the log likelihood ratio 

estimate is scaled by 25 v-(n + i)= ^ i<fl + {) 1211 

||F.-(»+D|| 

fjT m Vjin+V^VjW+a^dij-VrVj)?; 1 22 1 

30 „ Vj{n+l) I 23 ! 

This scaling removes the H^' (rt + 



and the set of desired dot products for each pair of vectors 

d 13 , . . . , d^, d 21 , d^, . . . , 
position each merchant vector such that a cost function is 
minimized, e.g: 



40 



35 



bias from the log likelihood ratio estimate. The preferred 
embodiment uses UDL2 in most cases. 

Accordingly, the present invention generally proceeds as 
follows: 

1. For each pair of root merchant names, determine the 
expected number of co-occurrences of the pair from 
total number of co-occurrence transactions involving 
each merchant name (with any merchant) and the total 45 
number of co-occurrence transactions. 

2. For each pair of root merchant names, determine a 
relationship strength measure based on the difference 
between the expected number of co-occurrences and 
the actual number of co-occurrences. 

3. For each pair of root merchant names, determine a 
desired dot product between the merchant vectors from 
the relationship strength measure. 

d) Merchant Vector Training 

The goal of vector training is to position the merchant 
vectors in a high-dimensional vector space such that the dot 
products between them closely approximates their desired 
dot products. (In a preferred embodiment, the vector space 
has 280 dimensions, though more or less could be used). 
Stated more formally: 

Given a set of merchant vectors V-{ Vj, V 2 , . . . , V^, 



65 



This technique converges as long as the learning rate (a) 
is sufficiently small (and determined by analysis of the 
particular transaction data being used; typically in the range 
0.1-0.5), however the convergence may be very slow. 

An alternative methodology uses averages of merchant 
vectors. In this embodiment, the desired position of a current 
merchant vector is determined with respect to each other 
merchant vector given the current position of the other 
merchant vector, and the desired dot product between the 
current and other merchant vector. An error weighted aver- 
age of these desired positions is then calculated, and taken 
as the final position of the current merchant vector. Written 
in terms of vector equations, the update rule is: 



50 



55 



60 



[24] 



where V ,y(n+l) is the updated position of the current 
— * — *■ 

merchant vector V t -, and U,y is the desired position of 

current merchant vector with respect to each other 

merchant vector V y . U, y may be calculated using 
formula: 



125] 



where d ty is the desired dot product between V, and V y , and 
€^ is the current dot product between V,- and V -. 
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Since V i} is a linear combination of merchant vectors V ( - 
and V ; ., it will always be in the plane of these vectors V ( . and 

The result of any of these various approaches is a final set 
of merchant vectors for all merchant names. 

Appendix II below, provides a geometrically derived 
algorithm for the error weighted update process. Appendix 
III provides an algebraically derived algorithm of this 
process, which results in an efficient code implementation, 
and which produces the same results as the algorithm of 
Appendix II. 

Those of skill in the art will appreciate that the UDL 
algorithm, including its variants above, and the implemen- 
tations in the appendices, may be used in contexts outside of 
determining merchant co-occurrences. This aspect of the 
present invention may be for vector representation and 
co-occurrence analysis in any application domain, for 
example, where there is need for representing high fre- 
quency data items without exclusion. Thus, the UDL algo- 
rithm may be used in information retrieval, document 
routing, and other fields of information analysis. 

3. Clustering Module 

Following generation and training of the merchant 
vectors, the clustering module 520 is used to cluster the 
resulting merchant vectors and identify the merchant seg- 
ments. Various different clustering algorithms may be used, 
including k-means clustering (MacQueen). The output of the 
clustering is a set of merchant segment vectors, each being 
the centroid of a merchant segment, and a list of merchant 
vectors (thus merchants) included in the merchant segment. 

There are two different clustering approaches that may be 
usefully employed to generate the merchant segments. First, 
clustering may be done on the merchant vectors themselves. 
This approach looks for merchants having merchant vectors 
which are substantially aligned in the vector space, and 
clusters these merchants into segments and computes a 
cluster vector for each segment. Thus, merchants for whom 
transactions frequently co -occur and have high dot products 
between their merchant vectors will tend to form merchant 
segments. Note that it is not necessary for all merchants in 
a cluster to all co-occur in many consumers' transactions. 
Instead, co-occurrence is associative: if merchants A and B 
co-occur frequently, and merchants B and C co-occur 
frequently, A and C are likely to be in the same merchant 
segment. 

A second clustering approach is to use the consumer 
vectors. For each account identifier, a consumer vector is 
generated as the summation of the vectors of the merchants 
at which the consumer has purchased in a defined time 
interval, such as the previous three months. A simple 
embodiment of this is: 




where C is the consumer vector for an account, N is the 
number of unique root merchant names in the customer 60 
account's transaction data within a selected time period, and 
V,. is the merchant vector for the X th unique root merchant 
name. The consumer vector is then normalized to unit 
length. 

A more interesting consumer vector takes into account 65 
various weighting factors to weight the significance of each 
merchant's vector: 



£ [27] 
(=1 

where W, is a weight applied to the merchant vector V;. For 
example, a merchant vector may be weighted by the total (or 
average) purchase amount by the consumer at the merchant 
in the time period, by the time since the last purchase, by the 
total number of purchases in the time period, or by other 
factors. 

However computed, the consumer vectors can then be 
clustered, so that similar consumers, based on their purchas- 
ing behavior, form a merchant segment. This defines a 
merchant segment vector. The merchant vectors which are 
closest to a particular merchant segment vector are deemed 
to be included in the merchant segment. 

With the merchant segments and their segment vectors, 
the predictive models for each segment may be developed. 
Before discussing the creation of the predictive models, a 
description of the training data used in this process is 
described. 

F. Data Postprocessing Module 

Following identification of merchant segments, a predic- 
tive model of consumer spending in each segment is gen- 
erated from past transactions of consumers in the merchant 
segment. Using the past transactions of consumer in the 
merchant segment provides a robust base on which to 
predict future spending, and since the merchant segments 
were identified on the basis of the actual spending patterns 
of the consumers, the arbitrariness of conventional demo- 
graphic based predictions are minimized. Additional non- 
segment specific transactions of the consumer may also be 
used to provide a base of transaction behavior. 

To create the segment models, the consumer transaction 
data is organized into groups of observations. Each obser- 
vation is associated with a selected end-date. The end-date 
divides the observation into a prediction window and an 
input window. The input window includes a set of transac- 
tions in a defined past time interval prior to the selected 
end -date (e.g. 6 months prior). The prediction window 
includes a set of transactions in a defined time interval after 
the selected end-date (e.g. the next 3 months). The predic- 
tion window transactions are the source of the dependent 
variables for the prediction, and the input window transac- 
tions are the source of the independent variables for the 
prediction. 

More particularly, the input for the observation generation 
module 530 are the master files 408. The output is a set of 
observations for each account. Each account receives three 
types of observations. FIG. 8 illustrates the observation 
types. 

The first type of observations are training observations 
which are used to train the predictive models that predicts 
future spending within particular merchant segments. If N is 
the length (in months) of the window over which observa- 
tion inputs are computed then there are -2N-1 training 
observations for each segment. 

In FIG. 8, there are shown a 16 months of transaction data, 
from March of one year, to June of the next. Training 
observations are selected prior to the date of interest, 
November 1. The input window includes the 4 months of 
past data to predict the next 2 months in the prediction 
window. The first input window 802a thus uses a selected 
date of July 1, includes March-June to encompass the past 
transactions; transactions in July-August form the predic- 
tion window 803a. The next input window 8026, uses 
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August 1 as the selected date, with transactions in April-July 
as the past transactions, August-September as prediction 
window 803£>. The last input window for this set is 8Q2d, 
which uses November 1 as its selected date, with an pre- 
diction window 803d of observations in November-Decem- 5 
ber. 

The second type of observations are blind observations. 
Blind observations are observations where the prediction 
window does not overlap any of the time frames for the 
prediction windows in the training observations. Blind 
observations are used to evaluate segment model perfor- 1 
mance. In FIG. 8, the blind observations 804 include those 
from September to February, as illustrated. 

The third observation type is action observations, which 
are used in a production phase. Action observations have 
only inputs (past transactions given a selected date) and no 15 
target transactions after the selected date. These are prefer- 
ably constructed with an input window that spans the final 
months of available data. These transactions are the ones on 
which the actual predictions are to be made. Thus, they 
should be the transactions in an input window that extends 20 
from a recent selected date (e.g most recent end of month), 
back the length of the input window used during training. In 
FIG. 8, the action observations 806 span November 1 to end 
of February, with the period of actual prediction being from 
March to end of May. 25 

FIG. 8 also illustrates that at some point during the 
prediction window, the financial institution sends out pro- 
motions to selected consumers based on their predicted 
spending in the various merchant segments. 

Referring to FIG. 4b again, the DPPM takes the master 30 
files 408, and a given selected end-date, and constructs for 
each consumer, and then for each segment, a set of training 
observations and blind observations from the consumer's 
transactions, including transactions in the segment, and any 
other transactions. Thus, if there are 300 segments, for each 35 
consumer there will be 300 sets of observations. If the 
DPPM is being used during production for prediction 
purposes, then the set of observations is a set of action 
observations. 

For training purposes, the DPPM computes transactions 40 
statistics from the consumer's transactions. The transaction 
statistics serve as independent variables in the input window, 
and as dependent variables from transactions in the predic- 
tion window. In a preferred embodiment, these variables are 
as follows: 45 

Prediction window: The dependent variables are gener- 
ally any measure of amount or rate of spending by the 
consumer in the segment in the prediction window. A simple 
measure is the total dollar amount that was spent in the 
segment by the consumer in the transactions in the predic- 50 
tion window. Another measure may be average amount spent 
at merchants (e.g. total amount divided by number of 
transactions). 

Input window: The independent variables are various 
measures of spending in the input window leading up to the 55 
end date (though some may be outside of it). Generally, the 
transaction statistics for a consumer can be extracted from 
various grouping of merchants. These groups may be 
defined as: 1) merchants in all segments; 2) merchants in the 
merchant segment being modeled; 3) merchants whose 60 
merchant vector is closest the segment vector for the seg- 
ment being modeled (these merchants may or may not be in 
the segment); and 4) merchants whose merchant vector is 
closest to the consumer vector of the consumer. 

One preferred set of input variables includes: 65 
(1) Recency. The amount of time in months between the 
current end date and the most recent transaction of the 
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consumer in any segment. Recency may computed over 
all available time and is not restricted to the input 
window. 

(2) Frequency. The number of transactions by a consumer 
in the input window preceding the end-date for all 
segments. 

(3) Monetary value of purchases. A measure of the 
amount of dollars spent by a customer in the input 
window preceding the end-date for all segments. The 
total or average, or other measures may be used. 

(4) Recency_segment. The amount of time in months 
between the current end date and the most recent 
transaction of the consumer in the segment. Recency 
may be computed over all available time and is not 
restricted to the input window. 

(5) Frequency_segment. The number of transactions in 
the segment by a customer in the input window pre- 
ceding the current end date. 

(6) Monetary_segment. The amount of dollars spent in 
the segment by a customer in the input window pre- 
ceding the current end date. 

(7) Recency nearest profile merchants. The amount of 
time in months between the current end date and the 
most recent transaction of the consumer in a collection 
of merchants that are nearest the consumer vector of the 
consumer. Recency may be computed over all available 
time and is not restricted to the input window. 

(8) Frequency nearest profile merchants. The number of 
transactions in a collection of merchants that are near- 
est the consumer vector of the consumer by the con- 
sumer in the input window preceding the current end 
date. 

(9) Monetary nearest frequency merchants. The amount 
of dollars spent in a collection of merchants that are 
nearest the consumer vector of the consumer by the 
consumer in the input window preceding the current 
end date. 

(10) Recency nearest segment merchants. The amount of 
time in months between the current end date and the 
most recent transaction of the consumer in a collection 
of merchants that are nearest the segment vector. 
Recency may be computed over all available time and 
is not restricted to the input window. 

(11) Frequency nearest segment merchants. The number 
of transactions in a collection of merchants that are 
nearest the segment vector by the consumer in the input 
window preceding the current end date. 

(12) Monetary nearest segment merchants. The amount of 
dollars spent in a collection of merchants that are 
nearest the segment vector by the consumer in the input 
window preceding the current end date. 

(13) Segment probability score. The probability that a 
consumer will spend in the segment in the prediction 
window given all merchant transactions for the con- 
sumer in the input window preceding the end date. A 
preferred algorithm estimates combined probability 
using a recursive Bayesian method. 

(14) Seasonality variables. It is assumed that the funda- 
mental period of the cyclic component is known. In the 
case of seasonality, it can be assumed that the cycle of 
twelve months. Two variables are added to the model 
related to seasonality. The first variable codes the sine 
of the date and the second variable codes the cosine of 
the date. The calculation for these variables are: 

Sin Input-sinf2.0"/7*(samplc day of ycar)/365) 
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Cos lnput=eos(2.0*f7*(sample month of yearV365). 

(15) (Segment Vector-Consumer Vector Closeness: As an 
optional input, the dot product of the segment vector for 
the segment and the consumer vector is used as an input 5 
variable. 

In addition to these transaction statistics, variables may be 
defined for the frequency of purchase and monetary value 
for all cases of segment merchants, nearest profile 
merchants, nearest segment merchants for the same forward 10 
prediction window in the previous year(s). 

G. Predictive Model Generation 

The training observations for each segment are input into 
the segment predictive model generation module 530 to 
generate a predictive model for the segment. FIG. 9 illus- is 
trates the overall logic of the predictive model generation 
process. The master files 408 are organized by accounts, 
based on account identifiers, here illustratively, accounts 1 
through N. There are M segments, indicated by segments 1 
through M. The DPPM generates for each combination of 20 
account and merchant segment, a set of input and blind 
observations. The respective observations for each merchant 
segment M from the many accounts 1 ... N are input into 
the respective segment predictive model M during training. 
Once trained, each segment predictive model is tested with 25 
the corresponding blind observations. Testing may be done 
by comparing for each segment a lift chart generated by the 
training observatioas with the lift chart generated from blind 
observations. Lift charts are further explained below. 

The predictive model generation module 530 is preferably 30 
a neural network, using a conventional multi-layer 
organization, and backpropagation training. In a preferred 
embodiment, the predictive model generation model 530 is 
provided by HNC Software's Database Mining Workstation, 
available from HNC Software of San Diego, Calif. 35 

While the preferred embodiment uses neural networks for 
the predictive models, other types of predictive models may 
be used. For example, linear regression models may be used. 

H. Profiling Engine 

The profiling engine 412 provides analytical data in the 40 
form of an account profile about each customer whose data 
is processed by the system 400. The profiling engine is also 
responsible for updating consumer profiles over time as new 
transaction data for consumers is received. The account 
profiles are objects that can be stored in a database 414 and 45 
are used as input to the computational components of system 
400 in order to predict future spending by the customer in 
the merchant segments. The profile database 414 is prefer- 
ably ODBC compliant, thereby allowing the accounts pro- 
vider (e.g. financial institution) to import the data to perform 50 
SQL queries on the customer profiles. 

The account profile preferably includes a consumer 
vector, a membership vector describing a membership value 
for the consumer for each merchant segment, such as the 
consumer's predicted spending in each segment in a prede- 55 
termined future time interval, and the recency, frequency, 
and monetary variables as previously described for predic- 
tive model training. 

The profiling engine 412 creates the account profiles as 
follows. 60 

I. Membership Function: Predicted Spending in Each 
Segment 

The profile of each account holder includes a membership 
value with respect to each segment. The membership value 
is computed by a membership function. The purpose of the 65 
membership function is to identify the segments with which 
the consumer is mostly closely associated, that is, which best 
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represent the group or groups of merchants at which the 
consumer has shopped, and is likely to shop at in the future. 

In a preferred embodiment, the membership function 
computes the membership value for each segment as the 
predicted dollar amount that the account holder will pur- 
chase in the segment given previous purchase history. The 
dollar amount is projected for a predicted time interval (e.g. 
3 months forward) based on a predetermined past time 
interval (e.g. 6 months of historical transactions). These two 
time intervals correspond to the time intervals of the input 
window and prediction windows used during training of the 
merchant segment predictive models. Thus, if there are 300 
merchant segments, then a membership value set is a list of 
300 predicted dollar amounts, corresponding to the respec- 
tive merchant segments. Sorting the list by the membership 
value identifies the merchant segments at which the con- 
sumer is predicted to spend the greatest amounts of money 
in the future time interval, given their spending historically. 

To obtain the predicted spending, certain data about each 
account is input in each of the segment predictive models. 
The input variables are constructed for the profile consistent 
with the membership function of the profile. Preferably, the 
input variables are the same as those used during model 
training, as set forth above. An additional input variable for 
the membership function may include the dot product 
between the consumer vector and the segment vector for the 
segment (if the models are so trained). The output of the 
segment models is a predicted dollar amount that the con- 
sumer will spend in each segment in the prediction time 
interval. 

2. Segment Membership Based on Consumer Vectors 

A second alternate, membership aspect of the account 
profiles is membership based upon the consumer vector for 
each account profile. The consumer vector is a summary 
vector of the merchants that the account has shopped at, as 
explained above with respect to the discussion of clustering. 
In this aspect, the dot product of the consumer vector and 
segment vector for the segment defines a membership value. 
In this embodiment, the membership value list is a set of 300 
dot products, and the consumer is member of the merchant 
segment(s) having the highest dot produces). 

With either one of these membership functions, the popu- 
lation of accounts that are members of each segment (based 
on the accounts having the highest membership values for 
each segment) can be determined. From this population, 
various summary statistics about the accounts can be gen- 
erated such as cash advances, purchases, debits, and the like. 
This information is further described below. 

3. Updating of Consumer Profiles 

As additional transactions of a consumer are received 
periodically (e.g. each month) the merchant vectors associ- 
ated with the merchants in the new transactions can be used 
to update the consumer vector, preferably using averaging 
techniques, such as exponential averaging over the desired 
time interval for the update. 

Updates to the consumer vector are preferably a function 
of dollars spent perhaps relative to the mean of the dollars 
spent at the merchant. Thus, merchant vectors are weighted 
in the new transaction period by both the time and the 
significance of transactions for the merchant by the con- 
sumer (e.g. weighted by dollar amount of transactions by 
consumer at merchant). One formula for weighting mer- 
chants is: 

WrSj* [28] 

where 

W ( - is the weight to be applied to merchant i's merchant 
vector, 
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S; is the dollar amount of transactions at merchant i in the 

update time interval; 
t is the amount of time since the last transaction at 

merchant i; and 
X is a constant that controls the overall influence of the 

merchant. 

The profiling engine 412 also stores a flag for each 
consumer vector indicating the time of the last update. 
I. Reporting Engine 

The reporting engine 426 provides various types of seg- 
ment and account specific reports. The reports are generated 
by querying the profiling engine 412 and the account data- 
base for the segments and associated accounts, and tabulat- 
ing various statistics on the segments and accounts. 

1. Basic Reporting Functionality 
The reporting engine 426 provides functionality to: 

a) Search by merchant names, including raw merchant 
names, root names, or equivalence names. 

b) Sort merchant lists by merchant name, frequency of 
transactions, transaction amounts and volumes, number 
of transactions at merchant, or SIC code. 

c) Filter contents of report by number of transactions at 
merchant. 

The reporting engine 426 provides the following types of 
reports, responsive to these input criteria: 

2. General Segment Report 

For each merchant segment a very detailed and powerful 
analysis of the segment can be created in a segment report. 
This information includes: 

a) General Segment Information 

Merchant Cohesion: A measure of how closely clustered 
are the merchant vectors in this segment. This is the average 
of the dot products of the merchant vectors with the centroid 
vector of this segment. Higher numbers indicate tighter 
clustering. 

Number of Transactions: The number of purchase trans- 
actions at merchants in this segment, relative to the total 
number of purchase transactions in all segments, providing 40 
a measure of how significant the segment is in transaction 
volume. 

Dollars Spent: The total dollar amount spent at merchants 
in this segment, relative to the total dollar amount spent in 
all segments, providing a measure of dollar volume for the 45 
segment. 

Most Closely Related Segments: A list of other segments 
that are closest to the current segment. This list may be 
ranked by the dot products of the segment vectors, or by a 
measure of the conditional probability of purchase in the 50 
other segment given a purchase in the current segment. 

The conditional probability measure M is as follows: 
P(A|B) is probability of purchase in segment A segment in 
next time interval (e.g. 3 months) given purchases in seg- 
ment B in the previous time interval (e.g. 6 months). 55 
P(A|B)/P(A)-M. If M is >1, then a purchase in segment B 
is positively influencing the probability of purchase in 
segment A, and if M<1 then a purchase in segment B 
negatively influences a purchase in segment A. This is 
because if there is no information about the probability of 60 
purchases in segment B, then P(A|B)=P(A), so M-l. The 
values for P(A|B) are determined from the co-occurrences of 
purchases at merchants in the two segments, and P(A) is 
determined and from the relative frequency of purchases in 
segment A compared to all segments. 

A farthest segments list may also be provided (e.g. with 
the lowest conditional probability measures). 



b) Segment Members Information 

Detailed information is provided about each merchant 
which is a member of a segment. This information com- 
prises: 

Merchant Name and SIC code; 

Dollar Bandwidth: The fraction of all the money spent in 
this segment that is spent at this merchant (percent); 

Number of transactions: The number of purchase trans- 
actions at this merchant; 

Average Transaction Amount: The average value of a 
purchase transaction at this merchant; 

Merchant Score: The dot product of this merchant's 
vector with the centroid vector of the merchant seg- 
ment. (A value of 1.0 indicates that the merchant vector 
is at the centroid); 

SIC Description: The SIC code and its description; 

This information may be sorted along any of the above 
dimensions. 

c) Lift Chart 

A lift chart useful for validating the performance of the 
predictive models by comparing predicted spending in a 
predicted time window with actual spending. 

Table 10 illustrates a sample lift chart for merchant 
segment: 

TABLE 10 



A sample segment lift chart 
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Cumulative 


Cumulative 


Cumulative 


Bin 


segment lift 


segment lift in S 


Population 


1 


5.56 


$109.05 


50,000 


2 


4.82 


$94.42 


100,000 


3 


3.82 


$74.92 


150,000 


4 


3.23 


$63.38 


200,000 


5 


2.77 


$54.22 


250,000 


6 


2.43 


$47.68 


300,000 


7 


2.20 


$43.20 


350,000 


8 


2.04 


$39.98 


400,000 


9 


1.88 


$36.79 


450,000 


10 


1.75 


$34.35 


500,000 


11 


1.63 


$31.94 


550,000 


12 


1.52 


$29.75 


600,000 


13 


1.43 


$28.02 


650,000 


14 


1.35 


$26.54 


700,000 


15 


1.28 


$25.08 


750,000 


16 


1.21 


$23.81 


800,000 


17 


1.16 


$22.65 


850,000 


18 


1.10 


$21.56 


900,000 


19 


1.05 


$20.57 


950,000 


20 


1.00 


$19.60 


1,000,000 


Base-line 




$19.60 





65 



Lift charts are created generally as follows: 
As before, there is defined input window and prediction 
window, for example 6 and 3 months respectively. Data from 
the total length of these windows relative to end of the most 
recent spending data available is taken. For example, if data 
on actual spending in the accounts is available through the 
end of the current month, then the prior three months of 
actual data will be used as the prediction window, and the 
data for the six months prior to that will be data for input 
window. The input data is then used to "predict" spending in 
the three month prediction window, for which in fact there 
is actual spending data. The predicted spending amounts are 
now compared with the actual amounts to validate the 
predictive models. 

For each merchant segment then, the consumer accounts 
are ranked by their predicted spending for the segment in the 
prediction window period. Once the accounts are ranked, 
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they are divided into N (e.g. 20) equal sized bins so that bin 
1 has the highest spending accounts, and bin N has the 
lowest ranking accounts. This identifies the accounts holders 
that the predictive model for the segment indicated should 
be are expected to spend the most in this segment. 

Then, for each bin, the average actual spending per 
account in this segment in the past time period, and the 
average predicted spending is computed. The average actual 
spending over all bins is also computed. This average actual 
spending for all accounts is the baseline spending value (in 
dollars), as illustrated in the last line of Table 10. This 
number describes the average that all account holders spent 
in the segment in the prediction window period. 

The lift for a bin is the average actual spending by 
accounts in the bin divided by the baseline spending value. 
If the predictive model for the segment is accurate, then 
those accounts in the highest ranked bins should have a lift 
greater than 1, and the lift should generally be increasing, 
with bin 1 having the highest lift. Where this the case, as for 
example, in Table 10, in bin 1, this shows that those accounts 
in bin 1 in fact spent several times the baseline, thereby 
confirming the prediction that these accounts would in fact 
spend more than others in this segment. 

The cumulative lift for a bin is computed by taking the 
average spending by accounts in that bin and all higher 25 
ranking bins, and dividing it by the baseline spending (i.e. 
the cumulative lift for bin 3 is the average spending per 
account in bins 1 through 3, divided by the baseline 
spending.) The cumulative lift for bin N is always 1.0. The 
cumulative lift is useful to identify a group of accounts 30 
which are to be targeted for promotional offers. 

The lift information allows the financial institution to very 
selectively target a specific group of accounts (e.g. the 
accounts in bin 1) with promotional offers related to the 
merchants in the segment. This level of detailed, predictive 35 
analysis of very discrete groups of specific accounts relative 
to merchant segments is not believed to be currently avail- 
able by conventional methods. 

d) Population Statistics Tables 

The reporting engine 426 further provides two types of 40 
analyses of the financial behavior of a population of 
accounts that are associated with a segment based on various 
selection criteria. The Segment Predominant Scores Account 
Statistics table and the Segment Top 5% Scores Account 
Statistics table present averaged account statistics for two 45 
different types of populations of customers who shop, or are 
likely to shop, in a given segment. The two populations are 
determined as follows. 

Segment Predominant Scores Account Statistics Table: 
All open accounts with at least one purchase transaction are 
scored (predicted spending) for all of the segments. Within 
each segment, the accounts are ranked by score, and 
assigned a percentile ranking. The result is that for each 
account there is a percentile ranking value for each of the 
merchant segments. 

The population of interest for a given segment is defined 
as those accounts which have their highest percentile rank- 
ing in this segment. For example, if an account has its 
highest percentile ranking in segment #108, that account 
will be included in the population for the statistics table for 60 
segment #108, but not in any other segment. This approach 
assigns each account holder to one and only one segment. 

Segment Top 5% Scores Account Statistics. For the 
Segment Top 5% Scores Account Statistics table, the popu- 
lation is defined as the accounts with percentile ranking of 65 
95% or greater in a current segment. These are the 5% of the 
population that is predicted to spend the most in the segment 



50 



55 



in the predicted future time interval following the input data 
time window. These accounts may appear in this population 
in more than one segment, so that high spenders will show 
up in many segments; concomitantly, those who spend very 
little may not assigned to any segment. 

The number of accounts in the population for each table 
is also determined and can be provided as a raw number, and 
as a percentage of all open accounts (as shown in the titles 
of the following two tables). 

Table 11 and Table 12 provide samples of these two types 
of tables: 

TABLE 11 
Segment Predominant Scores Account Statistics: 





Mean 


Std 


Population 


Relative 


Category 


Value 


Deviation 


Mean 


Score 


Cash Advances 


$11.28 


$53.18 


S6.65 


169.67 


Cash Advance Rate 


0.03 


0.16 


0.02 


159.92 


Purchases 


S166.86 


$318.86 


$192.91 


86.50 


Purchase Rate 


0.74 


1.29 


1.81 


40.62 


Debits 


S178.14 


$324.57 


$199.55 


89.27 


Debit Rate 


0.77 


1.31 


1.84 


41.99 


Dollars in Segment 


4.63 


14.34 


10.63% 


43.53 


Rate in Segment 


3.32 


9.64 


11.89% 


27.95 



TABLE 12 



Segment Top 5% Scores Account Statistics: 
154786 accounts (3.10 percent) 





Mean 


Std 


Population 


Relative 


Category 


Value 


Deviation 


Mean 


Score 


Cash Advances 


$9.73 


$51.21 


$7.27 


133.79 


Cash Advance Rate 


0.02 


0.13 


0.02 


125.62 


Purchases 


S391.54 


$693.00 


$642.06 


60.98 


Purchase Rate 


2.76 


4.11 


7.51 


36.77 


Debits 


$401.27 


$702.25 


$649.34 


61.80 


Debit Risk 


2,79 


4.12 


7.53 


37.00 


Dollars in Segment 


1.24 


8.14 


1.55% 


80.03 


Rate in Segment 


0.99 


6.70 


1.79% 


55.04 



i) Segment Statistics 

The tables present the following statistics for each of 
several categories, one category per row. The statistics are: 
Mean Value: the average over the population being 
scored; 

Std Deviation: the standard deviation over the population 
being scored; 

Population Mean: the average, over all the segments, of 
the Mean Value (this column is thus the same for all 
segments, and are included for ease of comparison); 
and 

Relative Score: the Mean Value, as a fraction of the 
Population Mean (in percent). 

ii) Row Descriptions 

Each table contains rows for spending and rate in Cash 
Advances, Purchases, Debits, and Total Spending. 

The rows for spending (Cash Advances, Purchases, and 
Debits) show statistics on dollars per month for all 
accounts in the population over the time period of 
available data. 

The rate rows (Cash Advance Rate, Debit Rate, and 
Purchase Rate) show statistics on the number of trans- 
actions per month for all accounts in the population 
over the time period of available data. 
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Debits consist of Cash Advances and Purchases. 

The Dollars in Segment shows the fraction of total 
spending that is spent in this segment. This informs the 
financial institution of how significant overall this 
segment is. 5 

The Rate in Segment shows the fraction of total purchase 
transactions that occur in this segment. 

The differences between these two populations are subtle 
but important, and are illustrated by the above tables. The 
segment predominant population identifies those individuals 10 
as members of a segment who, relative to their own 
spending, are predicted to spend the most in the segment. 
For example, assume a consumer whose predicted spending 
in a segment is $20.00, which gives the consumer a percen- 
tile ranking of 15 th percentile. If the consumer's percentile is 
ranking in every other segment is below the 15 th percentile, 
then the consumer is selected in this population for this 
segment. Thus, this may be considered an intra-account 
membership function. 

The Top 5% scores population instead includes those 20 
accounts holders predicted to spend the most in the segment, 
relative to all other account holders. Thus, the account 
holder who was predicted to spend only $20.00 in the 
merchant segment will not be member of this population 
since he is well below the 95** percentile, which may be 25 
predicted to spend, for example $100.00. 

In the example tables these differences are pronounced. In 
Table 11, the average purchases of the segment predominant 
population is only $166.86. In Table 12, the average pur- 
chase by top 5% population is more than twice that, at 30 
$391.54. This information allows the financial institution to 
accurately identify accounts which are most likely to spend 
in a given segment, and target these accounts with promo- 
tional offers for merchants in the segment. 

The above tables may also be constructed based on other 35 
functions to identify accounts associated with segments, 
including dot products between consumer vectors and seg- 
ment vectors. 

J. Targeting Engine 

The targeting engine 422 allows the financial institution to 40 
specify targeted populations for each (or any) merchant 
segment, to enable selection of the targeted population for 
receiving predetermined promotional offers. 

A financial institution can specify a targeted population 
for a segment by specifying a population count for the 45 
segment, for example, the top 1000 accounts holders, or the 
top 10% account holders in a segment. The selection is made 
by any of the membership functions, including dot product, 
or predicted spending. Other targeting specifications may be 
used in conjunction with these criteria, such as a minimum 50 
spending amount in the segment, such as $100. The param- 
eters for selecting the targeting population are defined in a 
target specification document 424 which is an input to the 
targeting engine 422. One or more promotions can be 
specifically associated with certain merchants in a segment, 55 
such as the merchants with the highest correlation with the 
segment vector, highest average transaction amount, or other 
selective criteria. In addition, the amounts offered in the 
promotions can be specific to each consumer selected, and 
based on their predicted or historical spending in the seg- 60 
ment. The amounts may also be dependent on the specific 
merchant for whom a promotion is offered, as a function of 
the merchant's contributions to purchases in the segment, 
such as based upon their dollar bandwidth, average trans- 
action amount, or the like. 65 

The selected accounts can be used to generate a targeted 
segmentation report 430 by providing the account identifiers 



for the selected accounts to the reporting engine 426, which 
constructs the appropriate targeting report on the segment. 
This report has the same format as the general segment 
report but is compiled for the selected population. 

An example targeting specification 424 is shown below: 

TABLE 13 

Target population specification 

ID associated 

with Customer 
promotional Segment target Selection 

offer ID count Criteria Filter Criteria 



122 75,000 Predicted Average 

Spending in Transaction in 
Segment Segment >$50 
143 Top 10% Dot Product Total Spending in 
Segment >$100 
12 and 55 87,000 Predicted None 
Spending in 
this 

Segment 12 
and 55 



Table 13 shows a specification of a total of at least 
228,000 customer accounts distributed over four segments 
and two promotional offers (ID 1 and ID 2). For each 
segment or promotional offer, there are different selection 
and filtering criteria. For promotion #1 the top 75,000 
consumers in segment #122 based on predicted spending, 
and who have an average transaction in the segment greater 
than $50, are selected. For this promotion in segment #413, 
the top 10% of accounts based on the dot product between 
the consumer vector and segment vector are selected, so 
long as they have a minimum spending in the segment of 
$100. Finally, for promotion #2, 87,000 consumers are 
selected across two segments. Within each offer (e.g. offer 
ID 1) the segment models may be merged to produce a single 
lift chart which reflects the offer as a composition of the 
segments. The targeting engine 422 then provides the fol- 
lowing additional functionality: 

1. Select fields from the account profile of the selected 
accounts that will inserted to the mail file 434. For 
example, the name, address, and other information 
about the account may be extracted. 

2. The mail file 434 is then exported to a useful word 
processing or bulk mailing system. 

3. Instruct the reporting engine 426 to generate reports 
that have summary and cumulative frequencies for 
select account fields, such as including purchase, debit, 
cash advance, or any other account data. 

4. Instruct the reporting engine 426 to generate lift charts 
for the targeting population in the segment, and for 
overlapped (combined) segments. 

K. Segment Transition Detection 

As is now apparent, the system of the present invention 
provides detailed insight into which merchant segments a 
consumer is associated with based on various measures of 
membership, such as dot product, predicted spending, and 
the like. Further, since the consumers continue to spend over 
time, the consumer accounts and the consumers' associa- 
tions with segments is expected to change over time as their 
individual spending habits change. 

The present invention allows for detection of the changes 
in consumer spending via the segment transition detection 
engine 420. In a given data period (e.g. next monthly cycle 
or multiple month collection of data) a set of membership 
values for each consumer is defined as variously described 
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above, with respect to each segment. Again, this may be 
predicted spending by the consumer in each segment, dot 
product between the consumer vector and each segment 
vectors, or other membership functions. 

In a subsequent time interval, using additional spending 5 
and/or predicted data, the membership values are recom- 
puted. Each consumer will have the top P and the bottom 0 
increases in and decreases in segment membership. That is, 
there will be two changes of interest: the P(e.g. 5) segments 
with the greatest increase in membership values for the 10 
consumer; the Q segments with the greatest decrease in 
segment membership. 

An increase in the membership value for a segment 
indicates that the consumer is now (or predicted to) spend 
more money in a particular segment. Decreases show a is 
decline in the consumer's interest in the segment. Either of 
these movements may reflect a change in the consumer's 
lifestyle, income, or other demographic factors. 

Significant increases in merchant segments which previ- 
ously had low membership values are particularly useful to 20 
target promotional offers to the account holders who are 
moving into the segment. This is because the significant 
increase in membership indicates that the consumer is most 
likely to be currently receptive to the promotional offers for 
merchants in the segment, since they are predicted to be 25 
purchasing more heavily in the segment. 

Thus, the segment transition detection engine 420 calcu- 
lates the changes in each consumer's membership values 
between two selected lime periods, typically using data in a 
most recent prediction window (either ending or beginning 30 
with a current statement date) relative to memberships in 
prior time intervals. The financial institution can define a 
threshold change value for selecting accounts with changes 
in membership more significant than the threshold. The 
selected accounts may then be provided to the reporting 35 
engine 426 for generation of various reports, including a 
segment transition report 432 which is like the general 
segment report except that it applies to accounts that are 
considered to have transitioned to or from a segment. This 
further enables the financial institution to selectively target 40 
these customers with promotional offers for merchants in the 
segments in which the consumer had the most significant 
positive increases in membership. 

In summary then, the present invention provides a variety 
of powerful analytical methods which predict consumer 45 
financial behavior in discretely denned merchant segments, 
and with respect to predetermined time intervals. The clus- 
tering of merchants in merchant segments allows analysis of 
transactions of consumers in each specific segment, both 
historically, and in the predicted period to identify consum- 50 
ers of interest. Identified consumers can then be targeted 
with promotional offers precisely directed at merchants 
within specific segments. 



APPENDIX I: N-gram Matching Algorithm 
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1. A set of training examples is presented to the algorithm. 
In this case, the training examples are all the merchant 
names that are being processed. 

2. Each training example is broken down into all possible 
n-grams, for a selected value of n (n-3 for trigrams) E.g. 60 
the merchant name "wal-mart" yields the trigrams ~w, 
~wa, wal, al-, 1-m, -ma, mar, art, rf, t M , where * is an "end 

of string" token. 

3. The frequencies with which each trigram appears any- 
where in the training examples are counted. 65 

4. In the preferred embodiment, each Ingram is assigned a 
weight, given by 



where xyz indicates the particular trigram, is the 
number of times the trigram appeared anywhere in the 
training examples, and N is the maximum value of F for 
all trigrams. Thus, frequently occurring trigrams are 
assigned low weights, while rare trigrams are assigned 
high weights. Other weighting schemes, including uni- 
form weights, are possible. 

5. A high dimensional vector space is constructed, with one 
dimension for each trigram that appears in the set of 
training examples. 

6. To compare two particular strings of characters (merchant 
names), stringl and string^, each string is represented by 
a vector in the vector space. The vector for a stringl is 
constructed by: 

a) counting the frequency of each trigram in the string, 

b) assembling a weighted sum of unit vectors, 

Vtaingl = ^ ^xyzfiyz u iyz 



where xyz ranges over all trigrams in stringl, and u 
is a unit vector in the direction of the xyz dimension in 
the vector space. 

c) normalizing V wHrtgl to length a length of one (preferred 
embodiment), or utilizing another normalization, or 
providing no normalization at all. 

d) construct the similar vector corresponding to the other 
string, V string2 

e) take the dot product of V sningl and V s[ring2 . A high dot 
product (near one) indicates that the two strings are 
closely related, while a low dot product (near zero) 
indicates that the two strings are not related. 

7. Two merchant names are equivalenced if their vectors' dot 
product is greater than a particular threshold. This thresh- 
old is typically in the range of 0.6 to 0.9 for the preferred 
embodiment. 

APPENDIX II: Geometrically Derived Vector 
Training Algorithm 

Initialize: 

For each stem, i e {all stems in corpus} 
V«rand_vector // random vector for stem i 
Normalize V / to length 1 

, //zero initialized update vector for stem i 

END 

For each stem, i e {all stems in corpus} 
Calculate Updates: 

For each stem, j e {all stems that co-occurred with stem 
i}> j*i 

We wish to calculate a new vector, U iy , that is the ideal 
position of V ( . with respect to \ f . In other words, we 
want the dot product of U„y with V 7 - to be d^, we want 
to have unit length, and we want U ( y to lie in the 
plane defined by V ( - and V-. 

D-V ( — V f //vector difference between vectors for 
stems j and i. 
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8 = D - V ; dot( V y> D) //8 is vector of components of D 
which are orthogonal to Vj. This defines a plane 
between Vj and 8 in which Vi lies. 



8=1 



//normalize 9 



influence the resulting merchant vector of merchant i. A 
slower influence is provided by weight_mode«LOG 
FREQ, which uses the log of F[j]. If weight_mode is not 
set, then the default is no influence by FJj]. 
5 6) Gamma is a learning rate 0-1, typically 0.5 to 0.9 

APPENDIX III: Algebraically Derived Vector 
Training Algorithm 



10 



( 4 



/= sqrt 



//l is weight for 6 
IF d, y >0 THEN //if positive relationship between stems 
j and i 

ELSE IF d, y <0 THEN //if negative relationship 

u iy =- Vy+z? 
END IF 



20 



25 



30 



//normalize 

We construct a weighted sum of the U, y for all j to 

derive an estimate of where V,- should be. 
IF weight_mode == LOG_FREQ THEN 

a v ( -AV f +!T l7 -[i-dot(u iy) V^Hi+iog n/H 
ELSE IF weighLjiiode « FREQ THEN 

AV^V^U^l-dot^, v,)}/T/j 
ELSE 
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Initialize: 

For each stem, i e{all stems in corpus}. 
— > 

V^-rand vector // initialize a random vector for stem i 

Normalize V f - //normalize vector to unit length 

i=0, //define a zero initialized update vector for stem 

END 

For each stem, i e {all stems in corpus} 
Calculate Updates: 

For each stem, j e {all stems that co-occurred with stem 
i},j*i 

// this is all merchants j which co-occur with merchant 

We wish to calculate a new vector, U, y , that is the ideal 
position of V. with respect to V y . In other words, we 
want the dot product of U iy with V y to be d, y , we want 
U, y to have unit length, and we want U, y to lie in the 
plane defined by V f and V y . 

if ty can be expressed as a linear combination of r V i and 

V - where: 



and 



A V^V f+ U ir [l-dot(U^ V^>] 

END IF 
END j 

Perform Update: 



V/"*"^! -gamma)- V^+gamma A V, 
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We construct a weighted sum of the U iy for all j to 

derive an estimate of where V,. should be. 
IF weight_mode «= LOG_FREQ THEN 



55 



END i 
NOTES: 

1) Stems here are root merchant names. 

2) The list of stems j (merchant names) which co-occur with 
stem i is known from the co-occurrence data. 

3) dij is relationship strength measure, calculated by UDL1, 
UDL2, or UDL3. 

4) F[j] is the frequency at which stem j appears in the data. 

5) Weight_mode is a user controlled value that determines 
the influence that F\j] has on the U. If weight_mode is 65 
FREQ then the frequency of stem j directly effects U, so 
that higher frequency stems (merchant names) strongly 



AV^AV^U^I-do^U,,., v^Hi+iog F\j]\ 
ELSE IF weight_mode =« FREQ THEN 

AVf-AV.+ U^l-dot^, V,)}F\j] 

ELSE 

60 AV^AV,+ U^l-dot<U iy , V f )] 



END IF 
END j 

Perform Update: 



w «(l -gamma)- V ,+gamma -A V, 
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END i 

Notes: 

1) Stems here are root merchant names. 

2) The list of stems j (merchant names) which co-occur with 
stem i is known from the co-occurrence data. 

3) dij is relationship strength measure, calculated by UDL1, 
UDL2, or UDL3. 

4) FJj] is the frequency at which stem j appears in the data. 

5) Weight_mode is a user controlled value that determines 
the influence that F[j] has on the U. If weight_mode is 
FREQ then the frequency of stem j directly effects U, so 
that higher frequency stems (merchant names) strongly 
influence the resulting merchant vector of merchant i. A 
slower influence is provided by weight_mode=LOG 
FREQ, which uses the log of FJj]. If weight_mode is not 
set, then the default is no influence by F[j]. 

6) Gamma is a learning rate 0-1 , typically 0.5 to 0.9 
We claim: 

1. A method of predicting financial behavior of 
consumers, comprising: 

generating from transaction data for a plurality of 
consumers, a date ordered sequence of transactions for 
each consumer; 

selecting for each consumer a set of the date ordered 
transactions to form a group of input transactions for 
the consumer; and 

for each consumer, applying the input transactions of the 
consumer to each of a plurality of merchant segment 
predictive models, each merchant segment predictive 
model defining for a group of merchants a prediction 
function between input transactions in a past time 
interval and predicted spending in a subsequent time 
interval, to produce for each consumer a predicted 
spending amount in each merchant segment. 

2. The method of claim 1, further comprising: 

for each consumer, associating the consumer with the 
merchant segment for which the consumer had the 
highest predicted spending relative to other merchant 
segments. 

3. The method of claim 1, further comprising: 

for each merchant segment, determining a segment vector 
as a summary vector of merchant vectors of merchants 
associated with the segment; and 

for each consumer, associating the consumer with the 
merchant segment having the greatest dot product 
between the segment vector of the segment and a 
consumer vector of the consumer. 

4. The method of claim 1, further comprising: 
for each merchant segment: 

ranking the consumers by their predicted spending in 
the merchant segment; 

determining for each consumer a percentile ranking in 
the merchant segment; 
for each consumer: 

determining the merchant segment in which the con- 
sumer's percentile ranking is the highest, to uniquely 
associate each consumer with one merchant seg- 
ment; and 

for each merchant segment, determining summary trans- 
action statistics for the consumers uniquely associated 
with the merchant segment. 
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5. The method of claim 1, further comprising: 
for each merchant segment: 

ranking the consumers by their predicted spending in 
the merchant segment; 

determining for each consumer a percentile ranking in 
the merchant segment; 

selecting as a population, the consumers having a 
percentile ranking in excess of predetermined per- 
centile threshold; and 

determining summary transaction statistics for selected 
population of consumers. 

6. The method of claim 1, further comprising: 
establishing for each merchant in the transaction data a 

merchant vector; and 
updating the merchant vector of each merchant relative to 
the merchant vectors of other merchants according to 
co-occurrences of each merchant in the transaction 
data. 

7. The method of claim 6, further comprising: 
updating the merchant vector of each merchant based 

upon an unexpected amount deviation in a frequency of 
co-occurrence of the merchant with other merchants. 

8. The method of claim 6, further comprising: 
determining a co-occurrence frequency for each merchant 

with each other merchant in the transaction data; 

determining for each pair of merchants, a relationship 
strength between the pair of merchants based on how 
much the determined co-occurrence frequency deviates 
from an expected co-occurrence frequency; 

for each pair of merchant vectors, mapping the relation- 
ship strength into a vector space as a desired dot 
product between respective merchant vectors the mer- 
chants in the pair; and 

updating each merchant vectors so that the actual dot 
products between each pair of merchant vectors 
approximates the desired dot product between the mer- 
chant vectors. 

9. The method of claim 8, wherein determining for each 
pair of merchants, a relationship strength between the pair of 
merchants further comprises: 

determining the relationship strength by 
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where 



)y is the relationship strength between merchant, and 

merchant, in a pair of merchants; 
T ( y is the actual co-occurrence frequency of merchant,- and 

merchant,- in the transaction data; and 
t'.y is the expected co-occurrence frequency of merchant, 

and merchant,, in the transaction data. 
10. The method of claim 8, wherein determining for each 
pair of merchants, a relationship strength between the pair of 
merchants further comprises: 
determining the relationship strength by 



nr si & 1 ( T ir^j)' V2 ln x 
where 

65 r,y is the relationship strength between merchant,- and 
merchant^ in a pair of merchants; 
X is a log-likelihood ratio; 
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T,y is the actual co-occurrence frequency of merchant,- and 
merchant,, in the transaction data; and 

%j is the expected co-occurrence frequency of merchant,- 
and merchant, in the transaction data. 

11. The method of claim 8, wherein determining for each 5 
pair of merchants, a relationship strength between the pair of 
merchants further comprises: 

determining the relationship strength by 



r y - = sign(T u - f 9 ) ^2lnA / yff^ = sig^j - 1 9 ) - V^T • f ^ 

where 15 
r,y is the relationship strength between merchant,- and 

merchanty in a pair of merchants; 
\ is a log-likelihood ratio; 

T ( y is the actual co-occurrence frequency of merchant,- and 2 o 

merchanty in the transaction data; and 
T ( y is the expected co-occurrence frequency of merchant,- 

and merchanty in the transaction data. 

12. The method of claim 8, wherein updating each mer- 
chant vectors so that the actual dot products between each 25 
pair of merchant vectors approximates the desired dot prod- 
uct between the merchant vectors comprises a gradient 
descent update that updates the merchant vectors according 

to whether the actual dot product between them is greater or 
lesser than the desired dot product. 30 

13. The method of claim 8, wherein updating each mer- 
chant vectors so that the actual dot products between each 
pair of merchant vectors approximates the desired dot prod- 
uct between the merchant vectors comprises determining for 
each merchant vector an 'error weighted average of the 35 
desired positions of the merchant vector from current posi- 
tion of each other merchant vector and the desired dot 
product between the merchant vector and each other mer- 
chant vector. 

14. The method of claim 1, further comprising: 40 
determining for each merchant name in the transaction 

data a merchant vector; 

clustering the merchant vectors to form a plurality of 
merchant segments, wherein each merchant vector is ^ 
associated with one and only one merchant segment; 

for each merchant segment, determining from the trans- 
actions of consumers at the associated merchants of the 
merchant, statistical measures of consumer transactions 
in the segment. 



,539 Bl 

46 

15. The method of claim 1, further comprising: 
selecting a plurality of consumers associated with at least 

one merchant segment, the selected plurality selected 
according to their predicted spending in the merchant 
segment; and 

providing promotional offers to the selected plurality of 
consumers. 

16. The method of claim 1, further comprising: 
training each of the merchant segment predictive models 

to predict spending in a predicted time period based 
upon transaction statistics of the consumer's transac- 
tions in a past time period. 

17. The method of claim 16, wherein the transaction 
statistics comprises variables describing the recency of the 
consumer's transactions in one or more merchant segments, 
the frequency of the consumer's transactions in one or more 
merchant segments, and the amount of the consumer's 
transactions in one or more merchant segments. 

18. A system for predicting consumer financial behavior, 
comprising: 

a plurality of merchant segments, each merchant segment 
having a set of merchants associated therewith; 

a plurality of merchant segment predictive models, each 
model associated with one of the merchant segments 
for predicting spending by an individual consumer in 
the merchant segment in a predicted time period as a 
function of transaction statistics of the consumer for 
transactions in a prior time period; and 

a data processing module that receives transaction data for 
a consumer, and constructs the transaction statistics for 
the prior time period for input into selected ones of the 
merchant segment predictive models. 

19. A system for forming merchant segments, comprising: 
a data processing module that receives consumer trans- 
action data for a plurality of consumer accounts, and 
organizes the transaction data by account, and within 
account, sequences the transactions by time; 

a data processing module that determines from the 
sequenced transaction data an expected frequency of 
co-occurrence for each merchant, and that constructs 
for each merchant a merchant vector as a function of 
unexpected frequency of co-occurrences of the mer- 
chant; and 

a clustering module that clusters the merchant vectors into 
merchant segment by determining merchant vectors 
that are closely aligned with each other. 
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